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I.  Introduction 


In  many  applications  such  as  high  speed  digital  signal  processing,  reliable  sim¬ 
ulations  of  dynamical  systems,  digital  implementation  and  simulation  of  chaotic 
systems,  etc.,  effects  of  finite  wordlength  are  a  critical  issue.  The  process  of  actual 
digital  computer  implementation  of  a  given  ideal  dynamical  system  can  he  charac¬ 
terized  by  several  open  parameters  that  have  a  critical  impact  on  the  performance 
of  the  actually  implemented  algorithm: 

1.  The  realization  (in  the  linear  case,  the  system  matrices):  This  determines  the 
coefficients  involved,  the  order  of  computation,  etc.  There  are  infinitely  many 
realizations  for  implementing  the  same  dynamical  system. 

2.  The  arithmetic  format:  This  determines  the  type  of  arithmetic  used  (fixed- 
point,  floating-point,  etc.),  the  register  lengths,  and  the  type  of  quantization 
operations  in  the  reformatting  processes. 

For  the  class  of  linear,  time-invariant  systems,  ^-operator  based  implementa¬ 
tions  were  shown  to  perform  superior  relative  to  their  g-operator  based  counterparts, 
if  the  sampling  rate  is  chosen  sufficiently  small  [1,2].  These  advantages  of  the  ^- 
operator,  especially  in  high  speed,  real-time  applications,  were  demonstrated  with 
respect  to  quantization  noise  at  system  output  and  differential  sensitivity  of  the 
frequency  response  with  respect  to  coefficients  of  system  realization  [1-4].  In  addi¬ 
tion,  the  use  of  ^-operators  allows  a  unified  treatment  of  both  the  continuous-  and 
discrete-time  cases.  These  properties  make  ^-operator  based  systems  an  attractive 
alternative  to  conventional  system  realizations. 

However,  a  number  of  questions  on  ^-operator  based  implementations  remained 
unanswered: 

1.  Do  the  advantages  that  have  been  demonstrated  for  the  linear  time-invariant 
case  carry  over  to  the  2-D,  m-D,  and  possibly  nonlinear  cases?  If  so,  ^-operator 
based  numerical  schemes  can  provide  a  completely  novel,  simple,  widely  appli¬ 
cable,  yet  more  reliable  methodology  for  system  simulation  and  realization. 
The  fundamental  importance  of  such  an  investigation  was  identified  at  the 
very  outset  by  the  PI. 

2.  What  about  asymptotic  stability  of  ^-operator  systems  and  the  possibility  of 
limit  cycles?  Although  quantization  noise  at  the  output  was  shown  to  be 
smaller  for  ^-system  realizations,  this  does  not  automatically  preclude  the  ex¬ 
istence  of  limit  cycles.  In  fixed-point  implementations,  the  existence  of  pro¬ 
hibitively  large  limit  cycles  was  evident.  Although  in  almost  all  applications 
such  behavior  is  unacceptable,  no  attention  had  been  directed  towards  this 
seemingly  generic  phenomena  of  ^-systems. 

The  above  questions  are  at  the  core  of  this  research  project.  This  report,  which 
provides  a  description  of  the  work  carried  out  under  this  research  project,  is  struc¬ 
tured  as  follows:  In  Section  II,  a  brief  description  of  the  proposed  tasks  is  outlined. 
In  Section  III,  the  results  obtained  are  briefly  described  on  a  qualitative  level  for 
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each  of  the  problem  areas  tackled.  Section  IV  offers  conclusions  and  summarizes  the 
accomplishments  and  their  significance.  Section  V  contains  pertinent  references. 

More  detailed  technical  decriptions  of  the  results  in  Section  III  may  be  found 
in  several  technical  papers/presentations,  and  these  are  included  in  Appendix  A.  It 
contains  all  those  material  that  have  already  been  published  in  or  submitted  to  jour¬ 
nals  or  conferences  as  well  as  those  (such  as,  presentations,  summaries,  etc.^  that  has 
been  submitted  to  ONR.  Appendix  B  contains  those  technical  papers/presentations 
that  have  some  peripheral  relevance  to  the  proposed  research,  and  those  in  which 
acknowledgement  of  ONR  support  is  given. 
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II.  Brief  Description  of  Tasks 


The  proposed  work  was  divided  into  three  major  tasks: 

Tl:  Analysis  and  design  of  finite  wordlength  implementations  of  linear  time- 
invariant  ^-systems. 

T2:  Analysis  of  nonlinear  circuits  through  ^-operator  based  schemes. 

T3:  2-D  and  m-D  ^-system  models. 

Task  Tl  reveals  some  fundamental  difficulties  in  the  implementation  of  6- 
systems  with  fixed-point  arithmetic.  It  focuses  mainly  on  zero  convergence  of  the 
free  system  response  and  exposes  the  existence  of  limit  cycles  as  well  as  effects  of 
sampling  time  A  quantization. 

Task  T2  is  a  study  of  whether  the  superior  finite  wordlength  properties  associ¬ 
ated  with  certain  linear  time-invariant  system  realizations  also  extend  to  nonlinear 
systems.  This  work  was  mainly  motivated  by  some  very  promising  simulation  results 
of  chaotic  systems. 

Task  T3  develops  the  formalisms  for  2-D  and  m-D  system  descriptions  in  S- 
operator  form.  It  also  investigates  sensitivity  properties  of  these  proposed  m-D 
^-models  and  compares  them  with  conventional  ^-models. 
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III.  Results  and  Accomplishments 


This  section  offers  brief  qualitative  descriptions  of  the  results  obtained  during  this 
project  period.  A  more  rigorous  quantitative  analysis  of  these  results  are  to  be 
found  in  Appendix  A  which  contains  all  relevant  technical  papers/presentations. 

III.l.  Task  Tl:  Analysis  and  Design  of  Finite  Wordlength  Implementa¬ 
tions  of  Linear  Time-Invariant  ^-Systems 

We  have  exposed  a  serious  limitation  of  ^-operator  based  realizations  of  discrete¬ 
time  systems:  They  cannot  be  free  of  limit-cycles  when  used  with  small  sampling 
times  and  fixed-point  arithmetic!  In  particular,  DC  limit  cycles  are  always  present 
when  sampling  time  is  smaller  than  0.5  for  rounding,  and  1.0  for  truncation.  In 
other  words,  under  these  conditions,  nonzero  initial  conditions  can  be  found,  such 
that  the  asymptotic  response  converges  to  an  incorrect  equilibrium  point  different 
from  the  origin  [5].  This  in  fact  is  a  generic  problem  with  ^-systems  in  the  sense 
that  it  is  independent  of  the  margin  of  stability  (of  the  ideal  linear  system)  and  its 
realization.  The  main  cause  of  this  lies  in  the  update  equation  where  multiplication 
by  sampling  time  (which  is  typically  small)  occurs.  This  results  in  a  difference 
vector  that  quantizes  to  zero. 

The  use  of  novel  quantization  schemes  with  smaller  deadzones  was  also  shown 
to  be  ineffective:  Although  quantizers  that  significantly  reduce  DC  limit  cycle  am¬ 
plitude  may  be  selected,  new  oscillatory  limit  cycles  are  usually  created.  A  newly 
developed  computer-aided  search  algorithm  for  the  existence  of  limit  cycles  may  be 
effectively  used  to  investigate  this  phenomenon  [6].  Through  construction  of  dead¬ 
band  regions  and  simple  bounding  hypercubes,  these  limit  cycle  amplitudes  have 
been  shown  to  grow  with  increasing  sampling  rate  [6,7].  Using  results  on  necessary 
1-D  conditions  for  stability  of  m-D  systems  [8],  m-D  ^-systems  were  also  shown  to 
produce  similar  limit  cycle  behavior  [7,9]. 

Another  drawback  of  fixed-point  5-operator  implementations  is  the  required 
high  dynamic  range  of  coefficients  and  signals.  This  is  due  to  the  fact  that,  given 
a  5-system,  in  obtaining  the  corresponding  5-system,  a  division  by  A  (which  is 
typically  small)  is  involved.  Hence,  additional  bits  in  coefficient /signal  registers  are 
generally  required  to  avoid  overflow  [10]. 

The  above  investigations  produced  the  following  unavoidable  conclusion:  Since 
5-operator  formulated  discrete-time  systems  are  superior  to  their  5-operator  coun¬ 
terparts  only  when  the  sampling  rate  is  chosen  to  be  significantly  smaller  than  one, 
fixed-point  arithmetic  is  not  a  suitable  format  for  5-system  implementations. 

The  situation  is  refreshingly  different  in  floating-point  arithmetic:  The  above 
mentioned  problems  (encountered  under  fixed-point  arithmetic)  vanish  and  5- 
systems  produce  significant  advantages  under  high  speed  conditions.  We  show  that. 
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under  floating-point  format,  a  stable  linear  system  (independent  of  realization)  can 
always  be  implemented  limit  cycle  free  in  the  regular  dynamic  range  [11].  Equiv¬ 
alently,  limit  cycles  can  always  be  restricted  into  underflow  conditions.  Such  limit 
cycles  are  acceptable  for  most  applications.  Furthermore,  the  large  dynamic  range 
requirements  of  <5-systems  may  easily  be  accommodated  in  floating-point  arithmetic. 

For  both  fixed  and  floating-point  systems,  new  differential  sensitivity  "measures 
which  are  widely  applicable  even  to  nonlinear  and  time-variant  systems  were  de¬ 
veloped  [10,12].  Instead  of  using  sensitivity  measures  related  to  frequency  response 
(as  is  the  usual  practice),  a  time  domain  approach  using  state-space  methods  was 
developed.  Sensitivity  of  state  trajectory  with  respect  to  system  coefficients  and 
initial  conditions  was  investigated.  For  linear  time-invariant  systems,  ^-operator 
based  implementations  have  been  shown  to  yield  lower  (by  a  factor  A)  sensitiv¬ 
ity  than  their  ^-operator  based  counterparts.  Sensitivities  with  respect  to  initial 
conditions  are  shown  to  be  identical  for  both  implementations. 

III.2.  Task  T2:  Analysis  of  Nonlinear  Circuits  Through  ^-Operator 
Based  Schemes 

The  following  aspects  of  nonlinear  ^-systems  were  addressed  in  detail: 

(a)  Sensitivity  of  state  response  with  respect  to  coefficients  of  the  nonlinear  equa¬ 
tion:  This  analysis  was  carried  out  for  various  types  of  nonlinearities  as  well  as 
for  both  fixed  and  floating-point  schemes  [10,12]. 

(b)  Bounds  on  quantization  error  magnitudes,  required  dynamic  range,  and  con¬ 
struction  of  majorant  systems  for  the  response  of  (^-operator  based  implemented 
nonlinear  systems  [10]. 

In  part  (a),  the  concept  of  differential  sensitivity  of  state  response  with  respect 
to  coefficients  of  the  nonlinear  equation  was  developed.  The  proposed  sensitivity 
measures  were  evaluated  for  linear  systems,  piecewise  linear  systems,  systems  with 
nonlinearities  and  systems  with  piecewise  nonlinearities. 

For  all  these  types  of  nonlinear  systems,  sensitivity  of  a  ^-system  with  respect  to 
coefficients  was  shown  to  be  smaller  (by  a  factor  A)  than  that  for  its  corresponding  q- 
system  under  fixed-point  arithmetic.  For  piecewise  linear  and  piecewise  nonlinear 
systems,  development  of  a  quantitative  measure  for  sensitivity  of  state  trajectory 
with  respect  to  initial  conditions  was  required  as  well.  This  is  due  to  how  the 
piecewise  characteristics  of  the  nonlinearity  is  modelled.  This  proposed  sensitivity 
measure  was  shown  to  be  comparable  for  both  q-  and  6-systems. 

Of  course,  nonlinear  6-systems,  implemented  in  fixed-point  arithmetic,  can  be 
shown  to  suffer  from  the  same  generic  problem:  Existence  of  incorrect  equilib¬ 
ria.  Since  this  is  a  serious  problem  especially  in  implementation  and  simulation 
of  nonlinear  systems,  floating-point  arithmetic  was  also  intensively  analyzed  as  an 
alternative.  Suitable  sensitivity  measures  were  developed  and  evaluated  for  the  non- 
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linear  system  types  mentioned  above.  A  comparison  with  corresponding  ^-operator 
based  systems  revealed  what  we  believe  to  be  a  very  important  observation:  Under 
mild  conditions  on  the  coefficients  of  the  ^-system,  the  state  trajectory  of  the  cor¬ 
responding  5-system  is  less  sensitive  than  that  of  the  g-system.  These  conditions 
turn  out  to  be  routinely  satisfied  if  the  nonlinear  discrete-time  system  is  obtained 
through  sampling  of  a  given  continuous-time  system  with  a  high  sampling  rate! 

In  part  (b),  a  comparison  between  q-  and  5-systems  was  conducted  via  quanti¬ 
zation  error  bounds.  For  the  fixed-point  case,  the  5-system  is  always  inferior  to  the 
5- system,  i.e.,  it  produces  larger  quantization  error  bounds  whenever  single-length 
accumulators  are  used  or  when  the  number  of  computations  in  the  state  equation 
significantly  exceeds  the  number  of  computations  in  the  update  equation. 

Systems  with  polynomial  type  nonlinearities  were  investigated  in  great  detail. 
For  this  class  of  nonlinearities,  recommendations  for  the  sampling  rate  which  would 
provide  an  optimal  balance  between  (a)  the  gains  obtained  from  a  reduced  sampling 
period,  and  (b)  the  increased  expense  from  a  higher  sampling  frequency,  are  made. 
For  sector  bounded  nonlinearities,  majorant  systems  for  the  state  response  were 
constructed.  When  the  sampling  time  is  much  smaller  than  1,  at  each  time  instant, 
these  majorant  systems  for  5-systems  produce  smaller  state  responses  than  those 
corresponding  to  5-systems. 

For  floating-point  arithmetic,  5-systems  produce  smaller  quantization  error 
bounds  than  corresponding  5-systems  only  when  the  nonlinearities  satisfy  certain 
magnitude  conditions  relative  to  the  state  vector.  It  was  shown  that,  these  condi¬ 
tions  are  always  satisfied  if  the  discrete-time  system  is  produced  by  sampling  the 
underlying  continuous-time  system  at  a  very  high  rate.  Note  that,  this  is  in  ac¬ 
cordance  with  our  previous  results  on  sensitivity.  The  underlying  reason  for  these 
advantages  of  5-systems  is  due  to  its  implicit  operand  sorting.  In  other  words, 
operands  of  similar  ‘size’  are  grouped  together  in  the  state  equation  of  5-systems, 
whereas,  in  the  5-operator  case,  such  a  grouping  is  not  implicit  and  a  mix  of  operands 
of  different  ‘sizes’  is  created. 

III.3.  Task  T3:  2-D  and  m-D  5-System  Models 


In  this  task,  the  5-operator  counterpart  to  the  2-D  Roesser  5-model  was  developed 
[13].  It  was  shown  that,  for  small  sampling  ‘times’  in  both  directions  of  propagation, 
the  proposed  2-D  and  m-D  models  possess  similar  properties  as  the  1-D  model.  For 
example,  fixed-point  implementations  are  still  plagued  by  limit  cycles  and  not  rec¬ 
ommended;  however,  floating-point  implementations  can  yield  extremely  attractive 
finite  wordlength  properties. 

The  usual  system  theoretic  notions  such  as  characteristic  equation,  transfer 
function,  stability  [14],  etc.,  have  been  developed  for  the  proposed  2-D  5-models. 
Furthermore,  the  notions  of  gramians,  balanced  realizations,  and  also  its  computa- 
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tion,  were  introduced. 

To  investigate  coefficient  sensitivity  properties  of  2-D  models,  sensitivity  mea¬ 
sures  appropriate  for  fixed  and  floating-point  arithmetic  schemes  were  developed. 
This  analysis  was  carried  out  for  the  more  general  multi-input,  multi-output  case. 
The  resulting  conclusions  may  be  summarized  as  follows: 

1.  In  the  fixed-point  case,  ^-models  yield  smaller  coefficient  sensitivity"  than  the 
corresponding  ^-models  when  the  sampling  ‘times’  are  small.  Balanced  real¬ 
izations  exhibit  minimum  coefficient  sensitivity.  This  parallels  the  situation 
encountered  in  g-operator  case.  However,  note  that,  generic  limit  cycle  prob¬ 
lems  persist. 

2.  In  the  floating-point  case,  (5-models  consistently  offer  superior  coefficient  sen¬ 
sitivity  when  the  corresponding  ^-models’  coefficients  satisfy  certain  mild  con¬ 
ditions.  These  conditions  are  routinely  satisfied  when  implementing  high-Q 
digital  filters  at  high  speeds.  In  most  situations,  2-4  mantissa  bits  of  an  advan¬ 
tage  is  possible. 

Furthermore,  computation  of  balanced  realizations  has  also  been  addressed. 
A  simple  relationship  between  balanced  forms  of  corresponding  q-  and  6-system 
realizations  has  been  established  [13].  This  makes  it  possible  to  derive  balanced 
realizations  using  those  algorithms  that  are  applicable  for  the  g-operator  case. 
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IV.  Conclusion 


In  summary,  results  obtained  during  the  course  of  this  funding  period  show  that, 
6-operator  implementations  of  discrete-time  systems  can  be  quite  superior  to  their 
^-operator  counterparts  if  they  are  used  correctly.  We  have  shown  that,  great  gains 
can  be  achieved  in  the  case  when  a  continuous-time  system  is  sampled  at  a  very 
high  rate  and  is  implemented  in  floating-point  arithmetic.  Similar  comments  are 
applicable  to  nonlinear  and  m-D  systems  as  well. 

Based  on  this  work,  we  may  make  the  following  conclusion:  6-operator  based 
implementations  offer  a  number  of  unique  and  desirable  properties  which  are  essen¬ 
tial  in  high  performance  applications,  such  as,  high  speed  DSP  and  reliable  simu¬ 
lations  of  dynamical  systems.  For  such  applications  (where  traditional  g-operator 
based  implementations  are  known  to  be  ill-conditioned),  6-operator  based  schemes 
provide  a  general  and  easily  applicable  technique  for  reliable  implementation  of 
discrete-time  systems. 
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Abstract.  The  recent  interest  in  delta-operator  (or,  (^-operator)  formulated  discrete- time  systems  (or, 
6-systems)  is  due  mainly  to  (a)  their  superior  finite  wordlength  characteristics  as  compared  to  their  more 
conventional  shift-operator  (or,  gr-operator)  counterparts  (or,  ^-systems),  and  (b)  the  possibility  of  a  more 
unified  treatment  of  both  continuous-  and  discrete-time  systems.  With  such  advantages,  design,  analysis, 
and  implementation  of  two-dimensional  (2-D)  discrete-time  systems  using  the  6-operator  is  indeed  war¬ 
ranted.  Towards  this  end,  the  work  in  this  paper  addresses  the  development  of  an  easily  implement  able 
direct  algorithm  for  stability  checking  of  2-D  6-system  transfer  function  models.  Indirect  methods  that 
utilize  transformation  techniques  are  not  pursued  since  they  can  be  numerically  unreliable.  In  develop¬ 
ing  such  an  algorithm,  a  tabular  form  for  stability  checking  of  6-system  characteristic  polynomials  with 
complex- valued  coefficients  and  certain  quantities  that  may  be  regarded  as  their  corresponding  Schur-Cohn 
minors  are  also  proposed. 

Keywords.  Two-dimensional  discrete-time  systems,  two-dimensional  digital  filters,  6-operator  formulated 
discrete-time  systems,  bivariate  polynomials,  Schur-Cohn  minors,  stability. 
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1.  Introduction 


The  increased  interest  in  (5-systems  during  the  recent  years  (see  [1-6],  and  references 
therein)  is  due  mainly  to  two  reasons:  (a)  <5-systems  provide  superior  finite  wordlength 
properties  with  respect  to  roundoff  noise  propagation  [5]  and  coefficient  sensitivity  [1],  [5], 
[7],  as  compared  to  their  ^-system  counterparts,  and  (b)  the  (5-operator  yields  the  differ¬ 
ential  operator  as  a  limiting  case  when  sampling  time  approaches  zero  enabling  a  unified 
treatment  of  both  continuous-  and  discrete-time  systems  [ij. 

With  such  advantages  in  mind,  development  of  2-D  and  multi- dimensional  (m-D)  6- 
system  models  must  clearly  be  undertaken.  Such  research  can,  for  example,  provide  m-D 
digital  filters  with  superior  roundoff  error  and  coefficient  sensitivity  performance  allowing 
their  implementation  to  be  carried  out  in  a  shorter  wordlength  environment.  This  is 
especially  crucial  in  real-time  applications,  such  as,  in  implementing  narrow  bandwidth 
filters  under  high  sampling  rates  (for  example,  in  current  wide  bandwidth  communication 
system  applications)  where  traditional  (^-operator  implementations  perform  poorly  [8]. 

In  applications  mentioned  above,  and  those  dealing  with  high-speed  processing  of  2-D 
and  m-D  data  (for  instance,  in  weather,  seismic,  gravitational  photographs,  video  images, 
systems  with  mutliple  sampling  rates,  etc.),  ensuring  stability  is  an  important  consideration 
(see  [9],  and  leferences  therein).  Given  the  characteristic  polynomial  of  a  (5-system,  to 
determine  stability,  one  may  first  use  a  variable  transformation  that  yields  a  more  familiar 
stability  region,  for  instance,  the  unit  bi-circle.  Then,  an  existing  technique  (see  [9-10], 
and  references  therein)  may  be  applied.  However,  such  techniques  are  known  to  be  prone 
to  numerically  ill-conditioning  [1],  [6].  In  the  1-D  case,  direct  stability  checking  methods 
for  ^-system  polynomials  are  in  [6]  (where  a  tabular  method  based  on  the  work  in  [11] 
is  given)  and  [12]  (where  a  Hermite-Bieler-like  Theorem  is  utilized).  Hence,  our  puri^ose 
here  is  to  develop  a  direct  easily  implementable  stability  checking  technique  applicable 
to  m-D  ^-systems.  As  usual,  for  notational  simplicity,  we  concentrate  on  the  2-D  case,  the 
extension  to  the  m-D  case  being  quite  straight-forward. 

In  checking  stability  of  bivariate  charactei'istic  polynomials,  two  conditions  must  be 
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satisfied. 

(a)  Condition  I  involves  a  1-D  stability  check  of  a  polynomial  with  real- valued  coefficients. 
One  may  use  the  table  form  in  [6].  Alternately,  one  may  utilize  an  explicit  root  location 
scheme. 

(b)  Condition  II  involves  a  stability  check  of  a  polynomial  with  complex- valued  coefficients 
where  the  latter  are  dependent  on  a  parameter  taking  values  on  a  certain  circle  in  the 
complex  plane.  Explicit  root  location  schemes  are  now  ineffective,  and  the  value  of  tabular 
methods  becomes  apparent.  Note  that,  in  such  a  situation,  compared  to  Nyquist-like 
techniques  [13],  tabular  methods  are  known  to  provide  certain  numerical  advantages  as 
well  [14]. 

In  checking  condition  II  for  2-D  g-systems,  an  effective  technique  involves  checking 
positive  definiteness  of  the  Hermitian  Schur-Cohn  matrix  [15].  This  lets  one  use  an  impor¬ 
tant  simplification  due  to  Siljak  [16].  The  tabular  form  in  [15]  takes  full  use  of  this  since  it 
provides  the  Schur-Cohn  minors  (that  is,  the  principal  minors  of  the  Hermitian  Schur-Cohn 
matrix)  directly  from  its  entries  [15],  [17].  A  similar  simplification  applicable  to  ^-systems 
is  clearly  possible  if  condition  II  may  be  reduced  to  checking  positive  definiteness  of  a 
Hermitian  matrix. 

With  the  above  in  mind,  we  develop  the  following  in  this  paper:  (a)  Tabular  form 
for  stability  checking  of  ^-system  characteristic  polynomials  possessing  complex-valued 
coefficients,  (b)  Analogs  of  Schur-Cohn  minors  and  a  corresponding  Hermitian  matrix 
applicable  for  such  systems,  and  (c)  a  direct  stability  checking  algorithm  for  2-D  ^-system 
transfer  function  models. 

The  paper  is  organized  as  follows.  Section  2  introduces  the  notation  used  throughout 
and  a  brief  review  of  previous  results.  Section  3  develops  a  tabular  form  for  stability 
checking  of  ^-systems  with  complex- valued  coefficients  and  some  important  relevant  results. 
Section  4  presents  quantities  that  may  be  regarded  as  the  analogs  of  Schur-Cohn  minors 
for  (^-systems.  The  2-D  stability  checking  algorithm  in  Section  5  is  based  on  the  tabular 
form  for  real-valued  coefficients  [6].  Since  only  little  extra  work  is  needed,  results  in  both 
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Sections  3  and  4  however  are  developed  for  the  more  general  complex- coefficient  case. 
Section  6  presents  an  example  to  validate  the  results.  Section  7  contains  the  conclusion 
and  some  final  remarks. 
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2.  Preliminaries 


2.1.  Notation 


3?,  9^  Real  and  complex  number  fields. 


var{-} 

Im[-] 

N 

A,  A* 
5R[u;]„,  S>[u;]„ 


3?(u;) 

[W2]n2 


3J(wi,U»2) 


Set  of  matrices  of  size  pxq  over  5ft  and  respectively. 

Number  of  sign  changes  in  the  sequence  {•}  of  real 
numbers. 

Real  part  and  imaginary  part  of  [•]  6  O'. 

Complex  conjugate  of  [•]  G  9. 

Transpose,  complex  conjugate,  and  complex  conjugate 
transpose  of  A  G  respectively. 

Set  of  univariate  polynomials  of  degree  n  (with  re¬ 
spect  to  the  indeterminate  ly  G  $>)  over  5ft  and 
respectively. 

Set  of  rational  univariate  polynomials  (that  is,  quo¬ 
tient  of  univariate  polynomials)  over  5ft. 

Set  of  bivariate  polynomials  of  relative  degrees  nj 
and  n2  (with  respect  to  the  indeterminates  ici  G 
and  W2  G  O',  respectively)  over  5ft. 

Set  of  rational  bivariate  polynomials  over 


z,  c  Indeterminates  of  q-  and  ^-systems,  respectively, 
r  Real  positive  number,  usually  the  sampling  time. 


The  transformation  relationship  between  corresponding  ^-and  (5-systems  is 


8  = 


q-l 


z  -  1 
r 


[•] 


G'(z)U^c 

H(^C\  ,  C2)|c — ►z 
G{zx,Z2)\  z — ►C 


^-system  quantity  analogous  to  its  corresponding  8- 
system  quantity  [•];  for  example,  transfer  function 
of  a  given  discrete-time  system  is  either  H{c)  if  im¬ 
plemented  based  on  the  ^-operator  or  H{z)  if  imple¬ 
mented  based  on  the  ^-operator. 

•^(^)lc=:(z  — l)/r 

■^(<'1 )  ^2  )  |c,  =(z;  —  l)/r,  1=1 ,2 
G{zi,  Z2)\zi  =  l  +  rci,i=l,2 


(2.1) 
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Stability  studies  of  1-D  and  2-D  q-  and  (5-systems  involve  the  following  regions; 


{z  G  :  1^1  <  1},  {(21,2:2)  G  9=^  :  |2,j  <  1,  i 

=  1,2}. 

{2:  G  :  |2|  <  1},  {(21, 22)  e  9^  :  |2i|  <  1,  i 

=  1,2}. 

{2  G  9  :  l^l  =  1},  {(21,22)  G  92  :  [2,1  =  1,  i 

=  1,2}. 

{c  G  9  :  |c  -1-  l/r|  <  1/r},  {(01,02)  G  9^  ;  j 
1/r,  f  =  1,2}. 

\ci  -h  l/r|  < 

{c  G  9  :  |c  -b  l/r|  <  1/r},  {(01,02)  G  9^  :  | 
1/r,  i  =  1,2}. 

\ci  +  I/t-]  < 

Ts,  V 

{c  G  9  ;  |c-b  l/rj  =  1/r},  {(01,02)  G  9^  :  j 
1/r,  e  =  1,2}. 

\ci  +  l/r|  = 

To  avoid  unnecessary  notational  complications,  the  sampling  time  in  both  horizontal  and 
vertical  directions  is  taken  to  be  equal  to  r  >  0. 

To  emphasize  the  degree  of  F{w)  =  ^  we  sometimes  denote  it 

as  F(w)n  as  well. 

F{w)  Conjugate  polynomial  of  F{w),  that  is, 

Reciprocal  polynomial  of  F(z),  that  is,  z’^F(l/z) 

F**(c)  Reciprocal  polynomial  of  F(c),  that  is,  (1  -f  rc)" 


A  g-system  polynomial  is  q-symmeiric  if  F(z)  =  F'^{z).  A  ^-system  polynomial  is  6- 
symmetric  if  F{c)  =  F'^(c). 

Tabular  forms  of  stability  checking  of  a  polynomial  in  typically  employ  a  sequence 

of  polynomials  each  of  descending  order.  The  first  row  of  such  a  tabular  form  is  denoted 
as  row  the  second  row  is  row  —  1,  and  so  on. 

JT,  MJT  Jury  table  [18],  modified  Jury  table  [15],  [17]. 

real-g-BT  Bistritz  table  for  g-system  polynomials  with  real¬ 
valued  coefficients  [11]. 

complex-g-BT  Bistritz  table  for  g-system  polynomials  with  complex¬ 
valued  coefficients  [19]. 

real-^-BT  Table  form  for  ^-system  polynomials  with  real-valued 
coefficients  [6]. 

complex-^-BT  Table  form  for  (5-system  polynomials  with  complex¬ 
valued  coefficients  (this  paper). 
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A  ^-system  polynomial  with  all  its  roots  in  Uq  (for  the  1-D  case)  or  (for  the  2-D  case) 
is  said  to  be  stable.  The  corresponding  regions  for  a  ^-system  polynomial  are  Us  (for  the 
1-D  case)  or  Uj  (for  the  2-D  case),  respectively. 


2.2.  Review  of  complex-q-BT 


The  complex-^-BT  introduced  in  Section  3  is  based  on  the  complex-g-BT,  and  hence,  we 
briefly  review  it  now.  For  more  details,  see  [10].  Let  the  characteristic  polynomial  of  a 
g-system  be 


.F(z)  =  ^  G  Sr[z]„  with  F(l)  G  5?  and  F(l)  ^  0.  (2.2) 

k-Q 


The  complex-^-BT  is  formed  using  the  symmetric  polynomial  sequence  {T(z)i}"_g 
where  [19] 


where 


'  F(z)n  +  F^(z)n,  for  i  =  n; 

f{z)i  =  {  for  i  =  n  -  1;  (2.3) 

'  (i.-.,.2+<ii+2z)r(z).H.i  fori=n-2,n-3,...,0, 

hi+2) 

t'n 


c  r(0)i+2  '-0  •  O  o  n 

OiA-2  =  -  =  -TT-rrr,  l  =  n  —  2,  n  —  3, .  .  .  ,  0. 


(2.4) 


f(o)H.  “r" 

As  in  [11]  and  [19],  equating  similar  powers  on  either  side,  we  may  also  get  the  following 


determinental  rule:  For  k  =  0,1, . . .  and  i  =  n  —  2, n  —  3, . . . , 0, 

1 


AO  _ 
— 


Ai+i) 


t 


At +2) 
^0 

Ai+2) 

U+1 

1 

1 

A«'+2) 

*'i+2 

At+2) 

''k+1 

At+1) 

^0 

At'+i) 

^k+l 

^  Af+i) 
U'+i 

At'+l) 

,  A«+2) 
■+■  ^fc+i  • 


(2.5) 


Remark.  The  computational  advantage  of  BT  is  due  to  f{z)i  being  ^-symmetric.  This 
implies  =  ti-i^  ,  =  0, 1, . . . ,  z,  and  hence,  it  is  necessary  to  evaluate  only  half  the 

coefficients  of  each  row. 

Using  (12-13),  (16),  and  Theorem  6  of  [19],  we  get 
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Theorem  2.1.  [19]  The  polynomial  F(z)  e  is  ^-stable  iff 

I.  0,  t  =  n  —  1,  n  —  2, . . . ,  0,  and 

II.  i/„  =  var{f(l)„,f(l)„_i,...,f(l)o}  =0. 

2.S.  Some  results  on  2-D  stability 
Consider  the  2-D  ^-system  transfer  function 

(2.6) 

F{Zi,Z2) 

where  E{^Z\^Z2)  G  [^2]n2  F{^zi^Z2)  G  5?[^i]rn [■Z2]n2-  The  2-D  z-transform  is 

taken  using  positive  powers  of  z,-.  For  a  comprehensive  discussion  regarding  stability  of 
such  systems,  see  [9-10],  and  references  therein.  Hence,  for  reasons  of  brevity,  only  some 
analog  results  applicable  to  2-D  ^-systems  are  provided.  It  is  only  necessary  to  observe 
that  the  corresponding  ^-system  iJ(ci,C2)  satisfies 

H{Ci,C2)  =  =  ^(^1,22)|z-.c  G  3fi(ci,C2)  (2.7) 

where  £^(ci,C2)  G  3?[ci]„Jc2]n2  and  F(ci,C2)  G  3^[ci]ni  [c2]n2 •  remainder  of  this  pa¬ 

per,  we  will  only  be  dealing  with  transfer  functions  iJ(ci ,  C2)  that  are  devoid  of  nonessential 
singularities  of  the  second  kind  on  and  the  pair  £^(ci,C2)  and  F(^c\ ,  C2)  is  taken  to  be 
coprime.  If  the  2-D  polynomial  F(ci,  C2)  7^  0,  V(ci,  C2)  G  77^,  it  is  said  to  be  S -stable.  After 
using  (2.1),  the  following  result  follows  directly  from  [20]: 

Theorem  2.2.  The  2-D  (5-system  in  (2.7)  is  (5-stable  iff 

I.  F{c, ,  -1/r)  ^  0,  Vci  G  Us,  and 

II.  F(ci,C2)  ^  0,  Vci  G  Ts,  Vc2  G  7/^. 

The  following  result,  which  allows  one  to  use  the  real-^-BT,  is  directly  from  [21-22] 
after  using  (2.1): 

Theorem  2.3.  The  2-D  (5-system  in  (2.7)  is  (5-stable  iff 
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II.  G(x,C2)  ^  0,  Vx  e  [— 2/r,  0],  Vc2  ^Ks- 
Here  G(x,C2)  =  F(ci,C2)F(ci,C2)  . 

x=(ci+Ci)/2 

Schur-Cohn  minors 

In  stability  checking  of  2-D  ^-systems,  the  following  result  is  important: 

Theorem  2.4.  [15],  [23-24]  The  polynomial  F{z)  e  ^z]n  is  stable  iff  >  0,  i  = 
1, 2, . . . ,  n,  where  is  the  principal  minor  of  the  Hermitian  Schur-Cohn  matrix  f  =  f  *  = 
{lij}  G  defined  as 

i 

k=l 


Stability  checking  of  2-D  g-systems  then  involves  positivity  checking  of  all  Schur-Cohn 
minors  A,(z),  \/i  =  1,2, ...  ,n,  \f\z\  =  1.  A  necessary  and  sufficient  condition  for  this  is 
positivity  of  Ai(l),  Vi  =  1,2,  ...,n,  and  A„(z),  Vjz)  =  1.  This  is  the  simplification  due 
to  [16]  that  has  been  effectively  utilized  in  applying  the  MJT  [15].  The  advantage  of  the 
latter  is  that  its  entries  yield  the  Schur-Cohn  minors  directly.  The  fact  that  complex-^-BT’s 
entries  also  yield  the  Schur-Cohn  minors  was  only  recently  shown. 

Theorem  2.5.  [10],  [25]  The  Schur-Cohn  minors  of  F{z)  are  the  principal  minors  of 
the  (n  X  n)  tridiagonal  Hermitian  matrix 


A  = 


0 


Iry(«-2)Kn-3)n 

2^^n~2  ^0  J 


0  0 
0  0 
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3.  Complex-^-BT 


With  no  loss  of  generality,  consider  the  ^-system  characteristic  polynomial 

^(c)  =  G  (3.1) 

k—0 

where 


G  3?  and  >  0.  (3.2) 

We  now  construct  the  complex-^-BT  with  the  use  of  the  ^-symmetric  polynomial  se¬ 
quence  {T(c)j}f_Q  where 


Here 


^(c)„  +  F^{c)n,  i  -  n; 
T(c),  =  <;  ^  n  -  1; 


(<^,+2  +  6i+2(l  +  rc))T(c),+i  -  T{c)^+2  . 

1  +  rc  ,  t  Sn-  Z. 


T(-l/T)i+2  .  o  o  ^ 

w  =  77^/  n  /  ^ - )  2  =  n  -  2,  n  -  3, . . . ,  0. 


(3.3) 


r(-l/r).+i 

The  normal  conditions  required  to  complete  the  sequence  are 


(3.4) 


0,  i  =  1,2, . . .  ,n  -  1. 


(3.5) 


Remarks. 


1.  To  determine  ^-stability  of  T(c),  one  may  of  course  first  obtain  F{z)  =  F{c)\c^,  and 
then  determine  its  g-stability  by  applying  familiar  stability  checking  algorithms  (e.g.,  BT 
or  MJT).  The  possible  shortcomings  of  such  a  scheme  are  outlined  in  [1]  and  [6].  The 
purpose  here  is  to  obtain  a  direct  check  for  ^-stability. 

2.  We  follow  the  work  in  [6]  and  [19],  and  hence,  for  brevity,  all  details  are  omitted. 

3.  The  conditions  r(— 1/r),-  =  0,  for  some  ^  =  1, 2, . . .  n  —  1,  imply  certain  singular  condi¬ 
tions  on  the  root  distribution  of  F{c)  [11],  [19].  The  equivalent  singular  conditions  for  the 
real-5-BT  is  in  [6]. 

4.  Using  5-symmetry,  it  is  easy  to  show  that 


n-l/r). 


0, 1, . . . ,  n. 


(3.6) 
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Therefore 

1 

^i+2  —  1  i  =  n  —  2,  n  —  3, . . . ,  0.  (3-7) 

The  normal  conditions  in  (3.5)  may  now  be  expressed  as 

7^  0,  i  =  n  -  2,n  -  3,. . .  ,0.  (3.8) 

Analogous  to  [6],  [11],  and  [19],  we  then  have 

Theorem  3.1.  The  polynomial  F{c)  e  &[c]„  is  stable  iff 

I.  ^  0,  i  =  n  —  1,  n  --  2, . . . ,  1,  and 

II.  Un  =  var{T(0)„,T(0)„_i,...,T(0)o}  =  0. 

One  of  the  main  advantages  of  the  complex-g-BT  is  that  all  computations  may  be 
carried  out  through  real  arithmetic  only  [19].  The  same  holds  true  for  the  the  complex-^- 
BT  introduced  above  as  well.  To  see  this,  let 

T{c)i  =  S(c)i  jA{c)i  with  8i  =  Re[^i]  +  jlm[^,],  (3.9) 

for  i  =  2,3, . . .  ,n.  It  is  easy  to  show  that  5(c)i’s  and  A(c)i’s  form  sequences  of  ^-symmetric 
and  (5-antisymmetric  polynomials,  respectively.  Now,  (3.3)  may  be  expressed  as 

S(c)i-2  =  ^  ^  [Re[5i](2  +  rc)  •  5(c), -i  +  Im[^, •]rc  •  A(c),_i  -  5(c),] ; 

(3.10) 

^(c).-2  =  ^  [-Im[<5,]rc  •  5(c),_i  +  (2  +  rc)Re[(5,]  •  A(c),_i  -  A(c),] , 

for  i  —  2,3, . . .  ,n. 

Remark.  Note  that,  T(0),-  =  5(0),-  +_7A(0)i  =  5(0)^. 

In  the  real-5-BT  construction,  a  certain  ‘scaling’  of  {T(c)i}^_Q  was  useful  [6].  We 
use  the  same  technique  in  the  complex-^-BT  case  as  well,  thus  providing  the  following- 
advantages:  (a)  Terms  containing  r  are  avoided  during  construction,  (b)  Si  and  Ui  may  be 
deduced  by  simple  inspection,  and  thus  (c)  computational  effort  is  reduced. 
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The  sequence  of  polynomials  that  incorporates  ^scaling’  is  where 


c^(c).  =  E”i‘’c'’=^w 


fc=0 


c=-C/t 


(<)  _ 


Ui.  = 


-  ]  ==  0, 
T 


(3.11) 


for  i  =  0, 1, . . .  5  n.  Thus,  from  (3.3),  we  get,  for  i  =  n  —  2,  n  —  3, . . .  0, 

“o'*  =  ('Si+O  + 

“I'*  =  («i+2  +  -  «,+2“E‘'  -  +  «‘'l,  .  fc  =  1. 2, . ; .  i. 


(3.12) 


Note  that 


«5i+2  = 


,  I<i+2)  -ir+2) 

1  ^,U-9 


U. 


i  =  n-2,n-3,...,0, 


and 


z/„  =  var{T(0)i}f=o  = 


(3.13) 


(3.14) 


Therefore,  condition  II  of  Theorem  3.1  may  be  checked  by  inspecting  the  constant  coeffi¬ 
cients  of  {/7(C)2}f3=o- 


Remark.  One  may  use  the  same  ‘scaling’  strategy  in  an  implementation  that  uses  only  real 
arithmetic. 


Relationship  between  complex-q-BT  and  complex-8-BT 


As  was  agreed  upon  previously,  given  F[c)n  G  S[^],  let  us  use  the  notation  F{z)-j-,  to 
indicate 


F{z)n  -  XF{c)n 


(3.15) 


where  A  6  3?  is  a  possible  scaling  constant.  The  establishment  of  the  relationship  be¬ 
tween  the  rows  of  complex-g-BT  of  F{z),  i.e.,  {T(2), ■}"_(),  and  complex-^-BT  of  F{c),  i.e., 
{T(c)i}"_Q,  which  is  the  subject  of  this  section,  is  useful  hiter  in  obtaining  the  Schur-Cohn 
minors  from  the  latter. 


Claim  3.2. 


F\z)n  =  AF«(c), 
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Proof.  Note  that 


F\z)n  =Z^P(-]  =  A2"F(c)„ 


F«(c)„  =(l  +  rc)"F 


1  +TC 


=  Xz^F  f  - — ^ 

C-H-iT  \  TZ 

=  z^Ff^^ 

c—z  V  TZ 


The  claim  is  thus  proven.  ■ 

Theorem  3.3.  The  rov/s  of  the  complex-^-BT  of  F{z)  and  the  complex-^-BT  of  F{c)  are 
related  by 

'  AT(c),- 


T(z)i  = 


^  z  =  n,  n  —  2, . . . ; 

^  i  =  n  —  l^n  —  3y , . . . 


Proof,  First,  using  Claim  3.2,  note  that 

f  (2)„  =  AT(c)„ 


Thus,  Theorem  3.3  is  established  for  i  =  n.  i  =  n  —  \  may  also  be  established  directly. 
For  i  =  n  —  2,  n  —  3, . . . ,  0,  use  (2.3)  and  (3.3).  ■ 

Corollary  3.4. 


II 

r  Ai(') 

1  r*  ’ 

for  i  —  n,n  —  2, . 

•  *5 

for  t  =  n  —  l,n  — 

3,. 

II 

1  r*b  ’ 

for  i  =  n,  n  —  2, . 

1 

for  t  =  n  —  1,  n  — 

3,. 

Proof.  This  follows  directly  from  Theorem  3.3. 
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4.  Schur-Cohn  Minors  for  ^-Systems 


We  now  develop  quantities  that  may  be  considered  the  analogs  of  Schur-Cohn  minors  for 
^-system  polynomials. 


Lemma  4.1.  The  relationship  between  the  complex-^-BT  of  F(c)n  G  Q'[c]n  and  the  Schur- 
Cohn  minors  i  =z  I  ^  2, . . . ,  n,  of  F(z)n  e  is 


\  2 

2.j-2(n— I-I-I)  [  1  ‘'n-i  ^n-i+l  ^n-i  )^t-l 

~  ],  A„  =  1,  A.  =  0,  i  <  0. 


Proof.  Note  that,  the  relationship  between  the  complex-^-BT  of  F(z)n  and  its  Schur-Cohn 
minors  are  given  by  [25] 


/^n-i'4-1)  Kn-0 


y(n-i+l)^n-i)x  y 


1  i^u-i+l)Kn-i)i2  X 

2l^n-z-hl  ^n-i  I  ^ 


z'~2 


with  Ao  =  1  and  =  0,  i  <  0.  Now,  the  claim  follows  from  Corollary  3.4. 


■ 


Let 


D  —  diag 


1  1 

/y-  Tl  ^  y  Tl  ”  1  ^ 


Then,  from  Lemma  4.1,  A  in  Theorem  2.5  is  given  by 


(4.1) 


where 


A  = 


A  =  •  D  •  A  •  D  (4.2) 


r  r7<Ti-l)  ,(n-2)i 

2  >•  n-1  ^n-2  J 

0 

0 

n-2  i 

f(4-y’ey'i 

0 

0 

T  ry(^  — 2)  7(n  — 3)i 

2  l-^n-2  ^n-3  J 

Keityty’i 

0 

(4.3) 

0 

0 

0 

Clearly,  positive  definiteness  of  A  and  A  are  equivalent  statements.  Hence,  we  may 
consider  the  principal  minors  of  A  to  be  the  Schur-Cohn  minors  of  F{c). 
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Definition  4.1.  The  Schur-Cohn  minors  of  F(c)  E  Q'fcjn  are  the  principal  minors  of  the 
tridiagonal  Hermitian  matrix  A  in  (4.3). 

Therefore,  from  Theorem  2.4,  we  have 

Theorem  4.2.  The  polynomial  F{c)  €  ^>[c]„  is  stable  iff  A,-  >0,  i  =  n,  where  A^ 

is  the  (t  X  t)-principal  minor  of  A  in  (4.3). 

Remarks. 

1.  Tridiagonal  Hermitian  matrices  constitute  an  important  class  of  matrices  that  have 
been  extensively  investigated  in  matrix  theory  literature  [26].  See  also  [10]. 

2.  Since  the  Schur-Cohn  minor  A,'  obtained  from  the  complex-^-BT  are  necessarily 
proper  [10],  [25],  the  Schur-Cohn  minors  defined  above  for  (5-systems  are  proper  as  well. 

In  terms  of  the  ‘scaled’  sequence  of  polynomials  {t^(C)i}[Lo)  Theorem  4.2  may  be 
stated  as 

Corollary  4.3.  The  polynomial  F{c)  E  9=[c]„  is  stable  iff  A,  >  0,  i  =  l,2,...,n, 
where  Af  is  the  {i  x  z) -principal  minor  of 
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5.  Algorithm  for  Checking  Stability  of  2-D  ^-Systems 


To  check  condition  II  of  Theorem  2.2,  we  may  adopt  the  following  approach: 

(a)  Express  F(^C\,C2)  €  [c2]„2  as  a  polynomial  in  SJ[c2]n2  that  its  coefficients,  as 

well  as  the  corresponding  Schur-Cohn  minors,  are  parameterized  by  C]  E  Tf.  Here,  we 
have  assumed  that  nj  ^  ^2;  otherwise,  the  roles  of  ni  and  77.2  may  be  interchanged. 

(b)  Check  positivity  of  each  of  the  Schur-Cohn  minors,  or  positive  definiteness  of  the 

tridiagonal  Hermitian  matrix  A  E  for  all  ci  E  %  (see  condition  11  of  Theorem  2.2 

and  Theorem  4.2).  These  checks  may  be  simplified  by  applying  a  direct  extension  of  Siljak’s 
result  [16]. 

However,  construction  of  the  complex-^-BT  and  the  entries  of  A  require  complex 
conjugation  of  certain  entries  that  are  functions  of  Ci  eTs-  This  of  course  complicates  the 
scheme  since  ci  =  -ci/(l  +  rci),  Vci  E  Tf,.  On  the  other  hand,  in  dealing  with  2-D  q- 
system  stability,  we  have  =  l/zj,  \/zi  E  Tq.  This  simple  relationship  has  led  to  stability 
checking  schemes  that  use  the  complex  forms  of  tabular  forms  [10]  that  incorporate  the 
•polynomial  array  method  [27].  To  circumvent  the  above  difficulty,  the  algorithm  given 
below  uses  the  real-^-BT  in  order  to  check  Theorem  2.3.  In  the  appendix,  an  easily 
implementable  algorithm  that  yields 

G(x,C2)  ^  G(x)n,ic2)2n^  =  F{cuC2)F{ci,C2)\  c,6T,  G  5?[x]„,  [C2]2n2  (5.1) 

x  =  (ci  -f-cj  )/2 

is  provided.  Note  that 

Cl  E  Ts  <;=>  X  E  [— 2/t,  0].  (5.2) 

Before  proceeding,  however,  it  is  important  to  note  that  tabular  methods  are  useful 
in  checking  for  no  roots  to  be  outside  the  stability  region.  However,  since  in  typical  2-D 
stability  studies  the  2-D  transforms  are  taken  with  positive  powers  [9-10],  prior  to  applying 
the  stability  check,  the  following  ‘preparation’  must  be  done: 

(a)  Condition  I  in  Theorem  2.3  may  be  checked  by  explicitly  finding  the  roots  or  applying 
the  real-(f-BT  to  ensure 

F»(ci)(-l/r)  =  (l  +  rci)”‘F(^y^^)  ^0,  \/ci  E^\Us  (5.3) 
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(that  is,  polynomial  is  reciprocated  with  respect  to  ci). 
(b)  First  form 


G(x)„j(c2)2n2 


2712 

e3t[x]njc2l2n2 

^=0 


Til 

where  ^  9f?[a;]ni. 

k-o 


(5.4) 


Here  x  G  [— 2/r,  0].  Now,  condition  II  in  Theorem  2.3  may  be  checked  by  applying  the 
real-(5-BT  to  ensure 


2712 

G(x)„i  (02)2^2  =  where  ^ 

/=0  A;  =  0 

=  G(x)“(c2) 

=  {l-\-TC2f'^^G{x)  (  )  y^O,  Vx  €  [-2/r,0],  Vc^  € 

\  1  -f  rc2  / 


(5.5) 


(that  is,  polynomial  is  reciprocated  with  respect  to  C2).  Again,  x  G  [— 2/r,  0]. 

We  will  hence  implicitly  assume  that  the  given  2-D  ^-polynomial  has  already  been  ap¬ 
propriately  ‘prepared’  as  above.  In  addition,  the  construction  of  the  real-^-BT  for  G{x){c2) 
requires  ensuring  [11] 

and  ??„“■*>  0,  Va;el-2/T,01.  (5.6) 


Violation  of  the  first  condition  in  (5.6)  is  equivalent  to 

F(ci)(0)  =  0  for  some  Ci  G  Ts-  (5.7) 

Assuming,  with  no  loss  of  generality,  >  0  for  some  x  G  [— 2/r,  0],  violation  of  the 

second  condition  in  (5.6)  is  equivalent  to 

F(ci)(  — 1/r)  =  0  for  some  ci  G  Ts.  (5-8) 

Therefore,  each  of  these  violations  imply  instability.  Verifying  condition  (5.7)  must  be 
included  in  the  algorithm.  Condition  (5.8)  is  automatically  verified  when  condition  I  in 
Theorem  2.3  is  checked  (see  (5.3)). 
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Then,  we  have  the  following 

Theorem  5.1.  The  2-D  ^-system  in  (2.7)  is  stable  iff 

I.  i^(ci)(-l/r)  ^  0,  Vci  G  Us,  and 

II.  F(ci)(0)  ^0,  Vci  G  Ts,  and 

III.  Aj(0)  >0,  Vz  =  1,  2, . . . ,  2n2,  and 

IV.  A2n2(^)  >  0,  Vx  G  [— 2/r,0],  which  is  satisfied  whenever  A2n2(‘^)  7^  0,  Vx  G  [™2/r,0], 
together  with  condition  III. 

Here,  A  is  the  Hermitian  matrix  mentioned  in  Theorem  4.2  corresponding  to  G{x)[c2) 
where  x  G  [— 2/r,  0]. 

Conditions  I  and  II  in  Theorem  5.1  are  easy  to  carry  out  (they  may  in  fact  be  verified 
by  explicitly  finding  the  roots).  Condition  III  and  IV  require  construction  of  the  reaI-<5-BT 
and  the  Schur-Cohn  minors  for  which  we  now  develop  polynomial  arrays  [27].  We  also 
provide  a  scaling  scheme  so  that  the  numerical  reliability  of  the  resulting  algorithm  is 
enhanced. 

5.1.  Polynomial  array  for  entries  of  real-6 -BT 
Express  G(x){c2)  as 

G(a:)(c2)  = 

where  ,  1]^,  ,  1]^,  and  G  = 

gj(ni+i)x(2n2+i)  jg  coefficient  matrix.  Then,  it  is  easy  to  show  that  [6] 

G(x){c2)  =  •  G  •  cf  where  G  = 

Here 

^  diag{r2"%T^"2“',...,l}  G  3?(2"2+i)x(2n2+i). 

P(2«2)  g  g^(2n2  +  l)x(2„2  +  l)  ^  ^  ^  ^2  n2  + 1 _ 

The  elements  pij,  which  in  fact  are  those  of  the  Pascal’s  triangle,  cire  given  by 

0,  for  i  <  j; 

1,  for2=j; 

Pi-ij-i  +  pi^ij,  elsewhere. 


(5.9) 
{i/.j}  e 

(5.10) 

(5.11) 


Pij  — 
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The  real-^-BT  is  constructed  using  the  ‘scaled’  polynomial  sequence  in  (3. 11-*  14).  Let 
^(y)(C)  =  G(a:)(c2)|e,=-c/.; 


c  =  ~  y/r 


Hiy)iO  =  G(a;)*(c2)  I  c2=-c/-  =  G{x){c2)  |  c2=-(/r  . 


(5.13) 


x--ylT 


c^-y/T 


Note  that,  x  e  [-2/r,0]  iff  y  G  [0,2],  Now,  using  (5.9-12),  row  #2n2  and  2n2  -  1  of  the 
corresponding  ‘scaled’  real-(^-BT  are  given  by 


^=0 

_  Q7^2n2)“^^j(2n2)  p(2n2)^  .  ^2n2). 

rj(  _  H{y){Q  -  H{y){Q 

U{y){Q2n2-\  =  2^  c  - - - 


(5.14) 


e=o 


—  y(”l)^  .  _  p(2n2)^ 


^2n2-l) 

0 


where  . . . ,  if,  and 

=diag{(-r)"E(-r)"^-i,...,l} 
j(2n2)  ^  diag{(-l)^”E  (-l)^”"“^  .  .  .  ,  1}  G  3[j(2«2  +  l)x(2n2-H). 
p(2n2)  ^  ^  g^(2n2  +  l)x(2n2  +  l)  p..  ^ 

Each  element  of  the  remaining  rows  is  of  the  form 

\^\y)  =  ^^,i  =  0,l,...,t,  i  =  2n2,2n2-l,...,0. 


(5.15) 


u 


(5.16) 


where  n['\y)  G  3i[y]a(‘)  and  S'\y)  G  3R[y]<^(.)-  Substituting  in  (3.12),  it  is  easy  to  show 
that,  for  £  =  0,1, 


L  ,  .  -  .  , 


n 


p  =  -  2n('+'))  -  +  4-i>  ^  =  2n2  -  2, . . . ,  0; 


d(') 


t+2 

1 ,  for  i  =  2n2 , 2n2  —  1 , 

^(i-l-2)^(^^i)^  for  i  =  2n2  —  2, . . . ,  0. 


(5.17) 


Note  that  and  u 


(2«2-1)  _ 


n), 


.  Moreover 


(i)  _  J  ”1’  *  “  2n2,2n2  -  1, 

^  <  ^(.+2)  forz  =  2n2-2,...,0; 


(0  _  /  for  i  =  2n2,2n2  —  1, 

**  1  (7^‘^  —  nj ,  for  i  —  2n2  —  2, . . . ,  0. 


(5.18) 
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Scaling  scheme.  Let  us  scale  rows  #2n2  and  #(2n2  —  1)  so  that  each  coefficient  takes 
values  in  [—1, 1],  Correspondingly,  for  £  =  0, 1, . . . ,  i;  i  =  2n2, 2n2  -  1,  let 

where  >  0,  t  =  2n2,2n2  —  1,  are  the  scaling  constants  and  [•]  denote  scaled 

quantities.  Note  that 


(5.19) 


d(2”2)  ^(2n2)  J(2n2)  ’ 


(6.20) 


^(2n2-l)  “  .y(2n2-l)  J(2n2-1) 

Now,  substituting  in  (5.17-18),  we  get 

(2n2-2) 


;^(2n2)A(2n2-l)  ”2n2 

j(2n2-2) 


(2n2)/-(2n2-l)  ,,-(2n2-l)x  -  (2n2 -1)  .  (2n2) 


_  2n 


'I  _  -^2(712-1;-^ 

)  ^2n2-l 


+ 


n 


(2n2-2) 

t-l 


A(2n2)A(2n2-l)  ’ 


(5.21) 


/y(2n2)^(2n2-l) 

It  can  now  be  seen  that,  it  is  only  necessary  to  compute  the  quantities  on  the  left  hand 
side  of  (5.21).  Then,  one  may  scale  these  to  get 

^(2n2“2)  _  ^(2n2)y(2n2-l)^(2n2-2)^(2n2-2)^ 

^  ^  ’  (5.22) 

j(2n2-2)  ^  ^(2n2)^(2n2-2)^(2n2-l)J(2n2-2)^ 

Note  that 

^(27.2-2)  \{2n7) \i2n7-2)  „(2«3-2) 

(5.23) 


j(2n2-2)  j(2n2)j(2n2-2)  J(2n2-2)  ’ 

Continuing  in  this  manner,  the  computation  of  the  entries  of  real-(5-BT  may  be  summarized 
as  follows: 

(a)  From  (5.14),  compute  i  =  2n2,27i2  —  1. 

(b)  From  (5.19),  use  scaling  constants  i  —  2n2,2n2  —  1,  to  get  i  = 

2n2,  2n2  -  1. 

(c)  From  (5.21),  for  £  =  0, 1, . . . ,  i;  i  =  2n2  —  2, . . . ,  0,  compute 


ICn 

d(') 


-  ,2(*  +  2).  (7+1) 

/  ^  —  "'i+2 


)  -  nir, 


(«+l) -((  +  2) 


n, 


+ 


n 


f-i 


K 


(0 


(5.24) 


K 


(<■) 
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and  use  scaling  constants  i  =  2n2  —  2, . . . ,  0,  to  get  i  =  2n2  2, . . . ,  0. 

Here,  Kn^  and  are  constants. 

(d)  Notice  the  relationships 


y(2n2)y(2n2-2)...y(i)  1 

;^(2n2-l)^(2n2-3)...;^(i) 

2 n 2  —  1 ) *y (2 ^ 2  ~  * 'y( *  )  (f(0  ’ 


for  i  —  2n2, 2n2  —  2, . . . ,  0; 
for  i  “  2n2  “  1, 2n2  —  3, . . . ,  1. 


(5^25) 


5.2.  Polynomial  array  for  Schur-Cohn  minors 


Each  Schur-Cohn  minor  obtained  from  the  table,  in  general,  will  be  of  the  form 

^i{y)  =  ^  2, . . . , 2^2, 

where  N^^\y)  G  3^[y]p(.)  and  G  3?[y]p(i)-  From  Corollary  4.3,  we  get 


(5.26) 


Ai(y)  =  ‘“S-T’A.-I  -  Am,  i  =  1,2,. . .  ,2n2,  (5.27) 


(2n2-i-l-l)^  (2n2-i)^ 


where  Aq  =  1  and  Aj  =  0,  Vz  <  0. 


Remark.  Actucdly,  as  in  [10],  one  may  show  that,  for  stability  determination  purposes,  only 
the  numerator  polynomials  of  A,-  need  be  computed.  However,  to  contain  the  orders  of  the 
resulting  polynomials,  and  hence  improve  numerically  conditioning,  we  do  not  recommend 
this  scheme. 


Scaling  scheme.  Due  to  the  scaling  of  entries  of  the  real-^-BT,  computation  of  A;,  i  = 
1,2,...,  2n2,  may  be  modified  as  follows;  Let 


Ax 


—u 


(2712) 

2n2 


n 


(2712-1) 
2712  —  1 


,y(7l22).y(2n2  — 1)  ^(27l2)(^(2n2— 1) 


Hence,  it  is  only  necessary  to  compute  the  quantity 


Ai  = 


42712)  .(2712-1) 
^^2  ^^2712-1 

J(2712)J(2712-1) 


(5.28) 


(5.29) 
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Continuing  in  this  manner,  the  computation  of  the  Schur-Cohn  minors  may  be  summarized 
as  follows:  Prom  (5.27),  for  t  =  1, 2, . . . ,  2n2,  compute 


Ai 


.(2n2-i+l) -(2n2-t) 
'*2n2-»-H  'hn2-i 
J(2n2-i-|-l)  J(2n2-0 


Ai_i  -f 


1  A( 


2n2 


(2n2“t+l)  ^.(2n2— i) 

2712-1  +  1  ^2712-7 


4^(2772-0  J(27l2-7+l)  J(2772  -7) 


A, 


(5.30) 


where  Aq  =  1  and  A^  =  0,  Vz  <  0. 


Remark.  Note  that,  since  Aj(y)  is  necessarily  a  proper  polynomial  (that  is,. denominator  di¬ 
vides  numerator  properly  with  no  remainder),  and  not  a  rational  polynomial  (see  Remark  2 
after  Theorem  4.2),  it  is  easy  to  see  that  J(2«2-i+i)  J(2772-t)  divide 

exactly. 


5.3.  Algorithm 


The  following  result,  which  is  the  basis  of  the  stability  checking  algorithm,  is  now  obvious 
from  [10]  and  Theorem  5.1: 

Theorem  5.4.  The  2-D  5-system  in  (2.7)  is  stable  iff 

I.  T(ci,  -1/r)  /  0,  Vci  G  Us,  and 

II.  7^(01  )(0)  7^  0,  Vcj  G  Ts,  and 

III.  A,(0)  >0,  Vz  =  1,  2, . . . ,  2n2,  and 

IV.  A,„,{!,)#0,  VySlO.Z], 

The  2-D  stability  checking  algorithm  may  now  be  summarized  as  follows: 

Given. 

A  2-D  ^-polynomial  F{ci^C2)  G  [c2]n2 •  Without  any  loss  of  generalit}',  assume 

that  ni  >  7^2,  and  express  F{ci,C2)  as  F{ci)n^{c2)n2' 

Step  I.  Condition  I  of  Theorem  5.4: 

Apply  an  explicit  root  location  procedure.  If  result  is  satisfactory,  proceed;  otherwise, 
system  is  unstable. 

Step  II.  Condition  II  of  Theorem  5.4: 
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Apply  an  explicit  root  location  procedure.  If  result  is  satisfactory,  proceed;  otherwise, 
system  is  unstable. 

Step  III. 

Form  G(y){c2)  using  the  algorithm  in  the  appendix;  then  form  U{y)(Q2n2  a^nd 
U{y){()2n2-i  from  (5.14).  These  yield  and  Of  course,  =  d^^nz-i)  _ 

1. 

From  (5.19),  obtain  and  the  associated  scaling  constants 

and  Of  course,  =  J(2n2-1)  _  l  ^(2n2)  _  ^(2n2-l)  _  2 

Step  IV.  Condition  III  of  Theorem  5.4: 

Form  Ai(j/)  from  (5.30)  and  check  whether  Ai(0)  >  0. 

If  result  is  satisfactory,  form  and  associated  scaling  constants 

^(2n2-2)  ^(2n2-2)  (5.24).  Form  A2(y)  from  (5.30)  and  check  whether  A2(0)  >  0. 

If  result  is  satisfactory,  proceed  likewise  until  A2„2(0)  >  0  is  checked.  Note  that,  this  re¬ 
quires  checking  of  only  the  constant  coefficients.  If  result  is  satisfactory,  proceed;  otherwise, 
if  the  check  fails  at  any  i  =  1, 2, . . . ,  2n2,  system  is  unstable. 

Step  V.  Condition  IV  of  Theorem  5.4: 

Apply  an  explit  root  location  procedure  to  check  whether  A2„2(y)  7^  0,  Vy  G  [0,2]. 

Remarks.  The  possible  numerical  difficulties  that  may  arise  in  using  explit  root  location 
procedures  may  be  avoided  as  follows:  (a)  Steps  I  and  II  may  be  verified  using  the  real-d- 
BT  [6],  and  (b)  step  V  may  be  verified  by  the  Sturm  sequence  method. 
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6.  Example 


The  stability  checking  algorithm  presented  in  the  previous  section  is  now  illustrated 
through  an  example.  Polynomial  entries  are  denoted  using  a  self-explanatory  shorthand 
notation  where  the  highest  degree  coefficient  is  written  first.  Moreover,  only  four  decimal 
digital  on  the  mantissa  are  shown. 

Consider  the  2-D  polynomial 

F{ci,C2)  =  [c\  Cl 

with  the  sampling  time  r  =  0.1  s. 

Step  L  Condition  I  of  Theorem  5.4: 

By  applying  an  explicit  root  location  procedure,  one  can  show  that 

F(ci)(-l/r)  =  340c?  16680C1  -h  236800  0,  Vci  e  Us- 


1  50  740 

fc^l 

C2 

52  2700  38480 

C2 

_  740  38480  547600 

_  1  _ 

Step  II.  Condition  II  of  Theorem  5.4: 

By  applying  an  explicit  root  location  procedure,  one  can  show  that 


T’(ci)(0)  =  740c?  +  38480ci  +  547600  ^  0,  Vci  G  Ts. 


Step  III.  Using  the  algorithm  in  Appendix,  we  get 


G{y){0  =  [y^ 


■1.2800e  +  03  1.2992e  +  05  5.1904e  +  06  9.6141e  +  07  7.0093e  +  08' 

rcM 

1] 

5.2480e-t-04  5.4011e  +  0C  2.1662e -|-  08  3.9968e-^09  2.8738e  +  10 

c 

.5.4760e  +  05  5.6950e  +  07  2.2912e  +  09  4.2143e  +  10  2.9987e -t- 11 . 

.  1 . 

After  scaling,  rows  ^4  and  ^3  are  computed  as  follows: 

=  [1.2859e  -  02,  -5.0693e  -  02,  5.1315e  -  02]; 

=  [-7.9833e  -  02,  3.1991e  -  01,  -3.2798e  -  01]; 
=  [1.9671e  -  01,  -7.9909e  -  01,  8.2798e  -  01]; 

=  [-2.3375e  -  01,  9.58.36e  -  01,  -l.OOOOe  +  00]; 
=  [1.1687e  -  01,  -4.7918e  -  01,  5.0000e  -  01], 
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with  =  1.1995e  +  12,  and 

hf  ^  =  [-2.4050e  -  02,  9.4053e  -  02,  -9.4595e  -  02]; 

=  [1.3044e  -  01,  -5.1542e  -  01,  5.2252e  -  01]; 
hf  ^  =  [-2.4703e  -  01,  9.8194e  -  01,  -l.OOOOe  +  00); 

=  [1.6469e  -  01,  -6.5463e  -  01,  6.6667e  -  01], 
with  =  5.3490e  +  10.  Of  course,  ^  ^  ^(4)  ^  ^(3)  ^  ^ 

Step  IV.  Condition  III  of  Theorem  5.4: 

We  get 

Ai  =  [3.0926e  -  04,  -2.4286e  -  03,  7.2184e  -  03,  -9.6217e  -  03,  4.8541e  -  03]. 

Clearly,  Ai(0)  =  4.8541e  -  03  >  0. 

Now,  row  ^2  is  computed  as  follows: 

rtf'*  =  [-8.8619e  -  03,  6.8253e  -  02,  -1.9920e  -  01,  2.6099e  -  01,  -1.2957e  -  01]; 
=  [3.3584e  -  02,  -2.5969e  -  01,  7.6068e  -  01,  -l.OOOOe  +  00,  4.9793e  -  01]; 

=  [-3.3584e  -  02,  2.5969e  -  01,  -7.6068e  -  01,  l.OOOOe  +  00,  -4.9793e  -  01], 
with  A^^)  =  4.2420e  —  02.  Also, 

J(2)  ^  [_2.5424e  -  01,  9.9428e  -  01,  -l.OOOOe  +  00], 

with  7^^^  =  9.4595e  —  02.  We  get 

A2  =  [1.8046e  -  07,  -2.8190e  -  06,  1.9343e  -  05,  -7.6148e  -  05,  1.8810e  -  04, 

-  2.985 7e  -  04,  2.9737e  -  04,  -1.6992e  -  04,  4.2654e  -  05]. 

Clearly,  A2(0)  =  4.2654e  —  05  >  0. 

Now,  row  is  computed  as  follows: 

=  [2.5168e  -  03,  -2.8515e  -  02,  1.3555e  -  01,  -3.4597e  -  01,  5.0000e  -  01, 

-  3.8792e-  01,  1.2623e-01]; 

=  [-5.0336e  -  03,  5.7031e  -  02,  -2.7110e  -  01,  6.9194e  -  01,  -l.OOOOe  +  00, 
7.7584e  -  01,  -2.5246e  -  01], 
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with  =  3.0980e  -  02.  Also, 

^  =  [— 3.3954e  —  02,  2.6151e  —  01,  — 7.6322e  —  01,  l.OOOOe  +  00,  — 4.9646e  —  01], 
with  7^^)  =  2.6099e  -  01.  We  get 

As  =  [4.0500e  -  10,  -9.3525e  -  09,  9.9260e  -  08,  -6.4020e  -  07,  2.7947e  -  06, 

-  8.6990e  -  06,  1.9797e  -  05,  -3.3188e  -  05,  4.0679e  -  05,  -3.5552e  -  05, 
2.1029e  -  05,  -7.5594e  -  06,  1.2489e  -  06], 

Clearly,  A3(0)  =  1.2489e  -  06  >  0. 

Now,  row  #0  is  computed  as  follows; 

=  [-1.0487e  -  04,  1.9379e  -  03,  -1.6174e  -  02,  8.0291e  -  02,  -2.6251e  -  01, 

5.9070e  -  01,  -9.2642e  -  01,  l.OOOOe  +  00,  -7.1104e  -  01,  3.0076e  -  01, 
-  5.7473e-02], 

with  =  4.4719e  -  02.  Also, 

=  [-6.7946e  -  04,  1.0355e  -  02,  -6.9373e  -  02,  2.6679e  -  01,  -6.4420e  -  01, 
l.OOOOe  +  00,  -9.7458e  -  01,  5.4519e  -  01,  -1.3404e  -  01], 
with  =  9.4174e  -  01.  We  get 

A4  =  [4.3531e  -  12,  -1.3058e  -  10,  1.8400e  -  09,  -1.6166e  -  08,  9.9118e  -  08, 

-  4.4970e  -  07,  1.5618e  -  06,  -4.2352e  -  06,  9.0628e  -  06,  -1.5355e  -  05, 
2.0530e  -  05,  -2.1433e  -  05,  1.7129e  -  05,  -l.OlSOe  -  05,  4.1814e  -  06, 

-  1.0762e  -06,  1.3014e-07]. 

Clearly,  A4(0)  =  1.3014e  —  07  >  0. 

Step  V.  Condition  IV  of  Theorem  5.4: 

By  applying  an  explicit  root  location  procedure,  one  can  show  that 

Vye[0,2]. 

Thus,  we  conclude  that  F(ci,C2)  is  stable. 
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7.  Conclusion  and  Final  Remarks 


In  this  paper,  we  have  developed  an  efficient  stability  checking  algorithm  applicable  for  2- 
D  ^-system  characteristic  polynomials.  Our  purpose  here  is  to  obtain  a  direct  algorithm 
due  to  the  possible  numerical  disadvantages  associated  with  indirect  methods  that  utilize 
transformation  techniques. 

In  arriving  at  the  algorithm,  the  following  contributions  have  been  made:  (a)  Tab¬ 
ular  method  of  stability  checking  applicable  for  ^-system  polynomials  possibly  possessing 
valued  coefficients,  (b)  quantities  that  may  be  regarded  as  the  Schur-Cohn  minors 
applicable  for  such  systems,  and  (c)  polynomial  arrays  for  computing  both  table  entries 
and  Schur-Cohn  minors. 

The  proposed  Schur-Cohn  minors  lets  one  use  a  Siljak-like  simplification  [16]  in  the 
stability  check.  Although  the  algorithm  utilizes  only  the  real-^-BT,  results  regarding  the 
Schur-Cohn  minors  are  in  fact  valid  for  the  more  general  complex-valued  coefficient  case 
as  well. 

As  in  [10],  it  is  possible  to  develop  the  algorithm  such  that  only  the  numerator  poly¬ 
nomials  of  the  entries  of  the  real-(5-BT  and  the  Schur-Cohn  minors  are  computed.  Then, 
we  do  not  require  polynomial  division  operations.  However,  our  experience  has  been  that 
such  a  scheme  is  prone  to  be  numerically  unreliable.  This  is  mainly  due  to  the  explosion 
of  polynomial  degree  especially  in  computing  the  Schur-Cohn  minors.  To  avoid  these  dif¬ 
ficulties  and  enhance  numerical  reliability,  we  have  (a)  introduced  a  scaling  scheme,  and 
(b)  used  polynomial  division  to  contain  the  polynomial  degree.  The  latter  is  not  new;  in 
fact,  MJT  also  uses  this.  If  the  user  is  interested  in  implementing  the  algorithm  using 
PHO-MATLAB  [28],  these  polynomial  division  operations  may  be  conveniently  performed 
using  the  routine  deconv. 


We  believe  that  a  suitable  scaling  strategy  can  improve  the  numerical  reliability  of 
the  MJT  as  well.  The  authors  are  currently  looking  into  this. 

The  algorithm  developed  is  easily  implementable  on  a  computer.  The  authors  have 


27 


Stability  Determination  of  Two-Dimensional  (^-Systems 


implemented  it  via  a  C-language  routine  that  the  interested  reader  may  request  from  the 
second  author. 
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Appendix.  Algorithm  to  obtain  (?(x)n,  (c2)2n2  from  (c2)„2 


Given 


^2  m 

-P’(ci)ni(c2)n2  =  where  /^(ci)  =  '^fk,e-c^,  cj  €  T5,  (a.l) 

^=0  h— n 


A:=0 

we  now  develop  an  algorithm  that  yields 

2n2 

G(x)n,(c2)2n,  =  ( -'C )  '  C^2  =  ^(^1  (c2  )„,  •  F(ci  )„,  (c2  ,  Cj  G  Ts-  (a.2) 

i=o 

First,  we  see 

n2  n2 

GW(‘=2)  =  EE«<:0«c-i)-c'+> 

fcO  J=:0 

722  2712  j  (s-.s) 

f=o  j=e  j=o  e=o 

(quantities  with  negative  subscripts  are  taken  to  be  zero).  Hence,  comparing  (a.2-3),  we 
get 

“  Til  721 


i=0  /z=n 


j 

=  E 


^=0 


0  7=0 

721  ni 


72i  72i 
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Let  US  use  the  notation 


„(n)  _ 


q  '  =  Cl  e  Ti,  n  =  0, 1, - 


Noting  that,  for  Ci  G  75, 

it  is  easy  to  show  that 

Substituting  in  (a.5),  we  get 

j 

feo 

Substituting  in  (a.4),  we  get 

j  ni 


Cl  = 


Cl 


1  +  rci  ’ 


2  (1) 

Cl  Cl  =  — ci  h 


,fc=0  t=0 


(1)'  a-i) 


e=o  k=o 


_ 2\  ^  J.  ^ ^ 

fk,efk,j~e  (  qr  ]  ■  ^1  ^  “^fk/fij-i 

'  1=0 


-2\‘' 


Cl  Cl 


Now,  in  order  to  develop  the  algorithm,  we  need  a  recursive  procedure 
pute  c['^\  n  =  0, 1, . . ..  To  proceed,  we  note  that 

(,,)  (ci  -bciXcr^  -bcr^)  -cic-i(cr^  -bcr^) 

2 

=  2cS^^  ,  n  =  2,3, . . . . 


Let 


where 


'r  =  E‘^K 


<">x‘ 


1=0 


c<‘'  =  I. 


Remark.  Note  that 


c<“>  =  1. 


Substituting  (a.l2)  in  (a.ll),  and  equating  similar  coefficients,  we  get 

4?  =  2  (cS"- V  +  ^  =  0, . . .  , n,  n  =  2, 3, . 


(a.6) 

(a.7) 

(a.S) 

(a.9) 
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For  instance, 
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rt  —  0,1,...,5,  may  be  conveniently  obtained  from 


Cj 

- 

^1 

— 

Cl 

Cl 

LCj  J 

L 

1 

2/r  2 

0  6/r 
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Cycle  Behavior  Of  Digital  Filters 
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Abstract  -.-The  presence  of  limit  cycles  that  may  arise  in  fixed-point  arithmetic  implemen¬ 
tation  of  a  digital  filter  can  significantly  impair  its  performance.  The  work  in  this  paper  presents 
an  algorithm  that  can  be  utilized  to  determine  the  presence  or  the  absence  of  such  limit  cycles 
of  a  given  digital  filter.  The  filter  is  assumed  to  be  in  its  state-space  formulation  and  hence, 
performance  of  the  corresponding  direct  form  representation  follows  as  a  special  case.  Moreover, 
the  algorithm  is  applicable  independent  of  the  filter  order,  type  of  quantization  nonlinearity,  and 
whether  the  accumulator  is  single-length  or  double-length.  In  developing  the  algorithm,  bounds 
on  the  amplitude  and  period  of  limit  cycles  of  a  given  digital  filter  are  obtained.  The  robustness  of 
the  algorithm  in  terms  of  limit  cycles  performance  with  respect  to  filter  coefficient  perturbations 
IS  verified.  Hence,  it  may  be  utilized  to  obtain  regions  in  the  coefficient  space  where  a  digital 
filter  of  given  order  is  limit  cycle  free. 
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I  Introduction 


A  digital  filter  may  be  realized  using  either  a  general  purpose  digital  computer  or  special  purpose 
digital  hardware.  In  either  case,  the  coefficients  and  intermediate  results  of  computations  must 
be  stored  in  binary  form  in  registers  of  finite  wordlength.  Limit  cycle  oscillations  are  a  direct 
result  of  this  limitation,  and  care  must  be  taken  to  suppress  them  while  performing  a  digital 
filter  design. 

For  the  past  several  years,  this  in  fact  has  been  a  research  topic  of  interest,  and  a  significant 
amount  of  insight  and  research  results  are  now  available  [1]-[10].  In  an  implementation  of  a 
higher  order  digital  filter,  as  shown  in  [11],  a  cascade  or  parallel  form  composed  of  first-order 
and  second-order  subfilters  is  preferable  over  any  direct  form  realization.  Therefore  the  results 
are  summarized  for  the  second  order  realizations.  Most  existing  results  focus  on  the  effects  oi 
signed  magnitude  rounding  and  truncation  quantization  schemes  with  regard  to  the  existence  or 
limit  cycles.  Recently,  some  work  addressing  the  two’s  complement  truncation  scheme  has  also 
appeared  [12]-[14]. 

This  work  proposes  an  algorithm  that  may  be  used  to  check  for  limit  cycles  of  a  given  digital 
filter  implemented  using  fixed-point  arithmetic.  It  possesses  a  wide  scope  of  applicability:  The 
digital  filter  to  be  tested  may  be  of  any  order;  the  quantization  scheme  may  be  arbitrary,  includ¬ 
ing  truncation  and  rounding  schemes  corresponding  to  signed  magnitude  and  two’s  complement; 
and  the  accumulator  may  be  of  single-  or  double- length. 

Given  a  digital  filter,  we  develop  bounds  on  the  amplitude  and  period  of  possible  limit 
cycles.  The  algorithm  is  based  on  an  exhaustive  search  procedure  over  all  these  possibilities.  In 
addition,  extending  the  same  procedure  to  the  entire  linear  stability  region,  one  may  utilize  it 
to  obtain  regions  in  the  filter  coefficient  space  where  the  given  filter  is  globally  asymptotically^ 
stable  (g.a.s.).  For  this  purpose,  the  robustness  of  the  algorithm  in  terms  of  presence  or  absence 
of  limit  cycles  with  respect  to  filter  coefficient  perturbations  is  also  verified.  A  similar  concept 
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has  been  used  before  for  checking  limit  cycle  behavior  of  digital  filters  implemented  in  direct 
form  [10],  [15]-[18].  The  major  advantage  of  the  proposed  method  is  that  it  is  applicable  for 
the  more  general  state-space  implementations.  Of  course,  the  direct  form  implementation  then 
follows  as  a  special  case. 

The  paper  is  organized  as  follows.  Section  II  contains  the  nomenclature  used  throughout  the 
paper.  Section  III  provides  bounds  on  the  amplitude  and  period  of  limit  cycles  of  a  given  general 
digital  filter.  Section  IV  discusses  the  algorithm  and  its  computational  aspects.  Section  V 
addresses  the  robustness  of  the  algorithm  with  respect  to  perturbations  of  filter  coefficients. 
Section  VI  contains  some  situations  where  the  algorithm  developed  has  been  used  effectively. 
Finally,  Section  VII  contains  the  concluding  remarks. 

II  Nomenclature 

The  following  notation  will  be  used  throughout  the  paper. 

Z  Set  of  reals,  set  of  integers. 

C  Set  of  complex  numbers. 

Z^  Nonnegative  integers. 

sfjmxn,  Set  of  matrices  of  size  mx  n  over  the  reals  and  integers. 

3^('2^)mxn  Set  of  matrices  of  size  mxn  over  the  rational  pol5Tiomials  in 

the  indeterminate  z  E  C. 

/C[-]  Cardinality  of  set  [•]. 

dij  (i,7)-th  element  of  the  matrix  A  =  {o^}. 

/,  0  Identity  matrix  and  null  matrix  of  appropriate  sizes. 

x(A:)  Filter  state  vector  at  instant  k. 

Xi{k)  f-th  component  of  the  state  vector  x(A:). 

II  •  Iloo  The  infinity  norm.  For  x  =  {xj}  6  ||x||oo  =  maxf  |xi|; 

for  A  =  {aij}  e  ||A||oo  =  max^  \aij\. 
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Mi 

Mi 

5{k) 

HiA^) 


hiAk) 

Pi 

Kij 

nr'^ 

Q 

Q[-] 

Q 

N 

e{k) 

T 

5(0) 

^dij 


Upper  bound  for  absolute  value  of  amplitude  of  Xi{k)^  k  G  Z-^-. 
Largest  integer  less  than  or  equal  to  Mi. 

Dirac  delta  function. 

(z,i)-th  element,  that  is,  the  (i,i)-th  transfer  function,  of  the 
MIMO  transfer  function  H(z). 

Impulse  response  of  Hij(z). 

z-th  pole  (accounting  for  multiplicity)  of  Hij{z). 

Constant  term  in  the  partial  fraction  expansion  of  Hij(z). 

A;-th  residue  of  Hij{z). 

Quantization  step  size. 

Quantization  nonlinearity  operator. 

Normalized  quantization  error.  For  instance,  for  roundoff,  g  =  0.5, 
and  for  truncation,  ^  =  1. 

Number  of  nonlinearities  in  a  realization. 

Quantization  error  vector. 

Limit  cycle  period. 

Set  of  state  vectors  satisfying  the  upper  bound  M  such  that 

ki|  <  Mi  Vz. 

Perturbation  of  the  coefficient  a^-. 


Ill  Amplitude  and  Period  Bounds  on  Limit  Cycles. 


In  general,  the  quantization  nonlinearity  satisfies 

\x-Q[x]\<Q-q,  VxgS? 

where  g  is  the  normalized  quantization  error.  In  particular,  for  roundoff  quantization,  g  =  0.5, 
and  for  truncation  quantization,  p  =  1.  Note  that,  all  the  filter  parameters  may  be  expressed  as 
integer  multiples  of  the  quantization  step  size  q.  Hence,  for  convenience,  we  normalize  q  to  unity 
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for  all  calculations.  The  quantization  nonlinearity  thus  becomes  an  integer  valued  function,  viz., 

2  (2) 

In  general,  for  all  quantization  schemes  of  interest,  Q[0]  =  0. 

We  consider  a  digital  filter  of  order  m  in  its  minimal  state-space  representation  {A,  B,  C,  D}, 
that  is, 

x(A:  +  1)  =  •  x(A:)  -h  B  •  n(k);  (3) 

y{k)  —  C  ■  x.(k)  +  D  •  u{k),  (4) 

where  x  e  5ft"*  is  the  state,  u  is  the  input,  and  y  is  the  output.  Also,  A  6  5ft"*^"*.  For  addressing 
limit  cycle  performance,  we  consider  the  zero  input  recursive  state  equation 

x(A;  -H  1)  =  A  •  x(^).  (5) 

Unless  otherwise  stated,  we  only  consider  linearly  stable  filters.  Hence,  all  eigenvalues  of  A  are 
inside  the  unit  circle  in  C. 

Now,  under  finite  wordlength  conditions,  the  appearance  of  the  pertinent  quantization  non¬ 
linearity  in  (5)  may  be  modeled  as 

x(A:  +  1)  =  Q[A  •  x(A:)].  (6) 

Depending  on  whether  the  result  of  a  product  can  be  stored  with  full  precision  or  whether 
quantization  is  performed  immediately  after  each  product  is  computed  determines  the  effect  of 
this  nonlinearity.  Considering  (5)  and  noting  that  x{k)  =  {x,}  €  5i"*  and  A  =  {ay}  6  5ft"*^"*, 
we  get  the  following: 

If  the  products  can  be  stored  with  full  precision,  that  is,  if  a  double-length  accumulator  is 
available, 

^{k  +  l)=  :  (7) 

V  Omi  •  Xj(k)]  / 
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and,  on  the  other  hand,  if  the  product  is  quantized  immediately  after  each  product  is  performed 
that  is,  if  only  a  single-length  accumulator  is  available, 

/  Q[aii  ■  Xi  (A:)]  4-  Q[ai2  •  X2{k)]  4-  ...  4-  Q[ai,n  ■  a^m(A:)]  \ 


x(k  4- 1)  = 


(8) 


\  Q[®ml  •  (^)]  4-  Q[Um2  •  3:2(A:)]  4"  ...  4-  Q[cimm  '  2:m(^)]  / 


Since  q  has  been  normalized  to  unity,  noting  (1),  (7)  and  (8)  may  be  expressed  in  a  unified 
manner  as 

x(A: -I- 1)  =  ^  .  x(^)  4- e(A:),  with  |ei(A;)|  <  A?' •  p,  (9) 

where  e(A:)  =  {ei(A:)}  E  3?"*  and  e.(^)  E  3?.  Note  that,  if  (7)  is  applicable,  iV  =  1;  if  (8)  is 
applicable,  N  =  m. 


We  note  that,  (9)  is  a  description  of  a  linear  system  driven  by  the  bounded  quantization 
error  input  e{k).  Hence,  we  have  in  fact  converted  the  nonlinear  systems  in  (7)  and  (8)  into  the 
linear  system  in  (9).  Now,  the  transfer  function  between  e(k)  and  x{k)  is 

X(2:) 


E(^) 


=  {Z-I-A)  ^eR{z)mxm, 


(10) 


where  X  and  E  are  the  z  transforms  of  x  and  e,  respectively.  This,  when  expanded,  may  be 
expressed  as 

X(^)^ 

\  Hmliz)  Hm2{z)  ...  Hmm{z)  ) 


(  Hn{z)  H,2{z)  ...  H,m{z)  \ 


E(z) 

where  Hij{z)  E  3?(2).  Hence, 

m 

^i{^)  —  ^  Ij  2, . . . ,  m,  (11^ 

jz=l 

where  X(2:)  =  {Xi}  and  E(z)  =  {Ej}.  Taking  inverse  ^-transform  of  the  above,  we  get 

m 

Xi[k)  hij(k)  *  Gj{k)^  i  ~  1,2,...,  m. 


j=i 


where  hij{k)  is  the  impulse  response  of  Hij{z),  Hence 


m  oo 


■i(^)  =  Z)  I]  hij{r)  •  ej{k  -  r),  i  =  1,  2, . . . ,  m. 

j~l r-0 


(12) 
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(13) 


Combining  (12)  with  the  fact  that  |ej(A:)|  <  IV  •  for  j  =  1, 2, . . . ,  m,  we  obtain 

m  oo 

|rCi(A:)|  <  iV  •  ^  ^  ^ 

j=l  k=Q 

Eqn.  (13)  may  now  be  used  to  provide  upper  bounds  for  each  state  vector  Xi  as  follows; 

m  oo 

Mi  =  N  ■  Q-J2'^\hij{k)\,  i  =  l,2,...,m.  (14) 

J=1 fc=0 

We  realize  that,  in  order  to  estimate  a  useful  upper  bound  for  each  Xi,  we  need  to  compute 

^  given  filter.  We  address  this  now.  Consider  the  transfer  function  Hij{z). 

All  poles  of  Hij{z)  are  distinct: 


In  this  case,  Hij{z)  may  be  expressed  as 

(1)  (m) 

Hij{z)  =  Kij  + - % - +  ...  + - - , 

where  €  C  and  Kij  e  5ft,  for  i,j,£,p,q  =  1, 2, . . . ,  m.  Taking  the  inverse  z- transform, 

we  have 

hy(k)  =  K,i  ■  6(k)  +  r-WlP''*]"  +  . . .  +  rSflPirf. 

where  5{k)  is  the  Dirac  delta  function.  Therefore 

OO  OO 

ElM*:)l  <  E{|Jf«ll'5(*)l  +  l’-!i'i0d‘’ii‘  +  ...  +  |rir>i(|/l"‘>ii‘} 

=  \Kii\  +  |r|;>|(l  -  |P;‘'|)-‘  +  . . .  +  |r!”>|(l  -  IfWl)-'. 


it=0 


This,  when  expanded,  gives 


//I 

EEl'>n(*)l<  +(l-|f’,“’l)-‘-El’-lfl  +  --- 

j=l  k=0  j=l 


+...  +(i  -  |p“i)-‘ ■  E  l4'"’l 


for  z  =  1, 2, . . . ,  m.  Hence 


|ii(fc)|  <  N  ■  -  {Ef.,  \Kij\  +  (1  -  |P,'‘>|)-‘  -EJL,  |rg>|  + 

...  +  (l-|pW|)-..j:™,|rM|}, 


(15) 
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for  z  —  1, 2, . . . ,  m.  Note  that,  convergence  of  the  above  is  guaranteed  due  to  linear  stability  of 
the  digital  filter. 


Remark.  The  method  adopted  in  [10]  tends  to  be  easier  to  implement  and  more  general 
with  regards  to  its  capability  of  handling  the  presence  of  poles  of  higher  multiplicity.  However, 
our  experience  has  been  that  the  technique  described  above  often  leads  to  lower  upper  bounds. 
Note  that,  the  technique  in  [10]  utilizes  an  interpretation  that  involves  a  cascade  of  first-order 
sections  to  obtain  a  bound  for  lx,!;  the  technique  above  utilizes  an  interpretation  that  involves  a 
parallel  combination.  Of  course,  no  one  technique  will  provide  a  lower  bound  for  all  situations. 
If  computer  cost  is  of  concern,  one  can  run  both  techniques  and  utilize  the  lower  value  of  the 
bound. 

Hij[z)  contains  a  pole  with  multiplicity  'j: 


At  this  point,  due  mainly  to  its  ease  of  implementation,  we  utilize  the  technique  in  [10]  where 
the  above  expression  is  interpreted  as  a  cascade  of  C  first-order  sections.  For  each  first-order 
section,  the  inverse  z-transform  is  taken  using  the  theory  outlined  in  the  distinct  pole  case. 
Consider 

(1  -  Pz-^)<  ~  (1  -  Pz-i)(l  -  Pz-^) ...  (1  -  Pz-1)  ■ 

Taking  the  inverse  z-transform,  we  get 


P2:-i)(1-Pz-i)...(1-P2-1) 


1-|P| 
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This  expression  is  now  substituted  for  the  pole  of  multiplicity  'y. 


Lemma  1:  The  zero  input  response  of  the  state  x(A:)  of  the  digital  filter  described  by  eqn  (7)  or 
(8)  is  periodic.  Its  period  T  satisfies 

m 

r<n(2--Wi  +  i)  =  rm»,  (16) 

»=1 

where  Mi  is  the  largest  integer  not  more  than  Mi  in  eqn  (14). 

Proof:  Consider  eqn  (7)  or  (8).  The  steady-state  solution  of  each  state  Xi{ff)  will  satisfy 

|xi(A;)|<Mi,  VA:,  i  =  l,2...,m. 

Under  fixed-point  arithmetic,  x(A;)  e  and  hence, 


kt(A;)|  <  Mi,  'ik,  f  =  l,2...,m. 

Xi{k)  can  therefore  take  only  a  finite  number  of  values,  namely,  (2  •  M  +  1).  As  a  result  of  this, 
x(A:)  can  take  only  a  finite  number  of  values,  namely, 

m 

n(2*i+i)- 

i=l 

Note  that,  the  current  state  vector  x(A;)  uniquely  determines  the  next  state  vector  x(A:  -1- 1) 
through  the  function  Q[-].  Thus,  x(A:)  must  be  periodic  in  k.  Its  period  is  in  fact  bounded  by 

m 

Tmax  =  n(2  •  Mi  -f-  1).  (17) 

i=l 


We  now  have  bounds  on  the  amplitude  as  well  as  the  period  on  the  possible  limit  cycles. 
This  information  will  be  invaluable  for  developing  our  search  algorithm. 
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IV  Algorithm  Description  and  Its  Computational  As¬ 
pects 


In  this  section,  we  formulate  the  theoretical  basis  for  the  algorithm  and  discuss  some  of  its 
computational  aspects. 


Definition  1:  The  digital  filter  realization  in  (9)  is  said  to  be  globally  asymptotically  stable 
(g.a.s.)  if  and  only  if,  for  any  initial  state  x(0)  6  2:'"  with  ||x(0)||oo  <  B,  where  B  6  2+,  there 
exists  LeZ+  such  that  x(A;)  =  0  for  k  >  L. 

Remark.  IVpically,  g.a.s.  is  taken  to  hold  when  x(k)  ->  0  as  A:  oo  (under  the  conditions 
above).  However,  due  to  the  finite  wordlength  available  in  each  register,  the  digital  filter  behaves 
as  a  finite  state  machine,  and  Definition  1  suffices. 

Lemma  2:  Consider  77  >  0  and  any  initial  state  vector  x(0)  such  that  ^ 

|xi(0)|  <  Hi,  for  i  =  1,2,  ...,m, 

with  Hi  >  Mi,  for  i  =  1, 2, . . . ,  m.  Then,  there  exists  a  sufficiently  large  positive  number  C  such 
that  the  digital  filter  in  (7)  or  (8)  satisfies 

<Mi+77,  'ik>C, 

for  f  =  1,  2, . . . ,  m. 


Proof:  Since  the  eigenvalues  of  A  are  assumed  to  lie  inside  the  unit  circle  in  the  complex  plane, 
the  digital  filter  in  eqn.  (9)  is  in  fact  g.a.s.  Hence,  eqn.  (9)  will  yield  a  set  of  nonhomogeneous 
linear  shift-invariant  difference  equations  which  will  have  its  solution  in  two  parts:  A  steady- 
state  solution  s(A;)  and  a  transient  solution  t(A;).  Clearly,  with  g.a.s.,  given  77  >  0,  we  can  choose 
k  sufficiently  large,  say,  k  >  jC,  such  that  ( 


max|ti(A:)|  <  77,  for  i  =  l,2,...,m. 
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Since  Mi  6  Z+,  for  k  >  C,  Mi  +  77  will  therefore  act  as  a  true  upper  bound  for  Xi(k)  in  eqn.  (9). 


Hence,  it  suffices  to  check  the  state  vectors  in  the  set  where 

S<“>  =  {x(A)e2’"||j:j(*:)|<Mi,  i  =  1, 2, . . .  .m}  ,  (18) 

to  see  if  they  are  mapped  to  the  zero  vector  by  eqn.  (9)  after  a  finite  number  of  mappings. 

Computational  Aspects 

The  computations  within  the  algorithm  are  carried  out  in  two  stages.  Initially,  all  vectors 
x{k)  e  which  map  to  0  in  less  than  Tmax  recursions — (after  all,  if  limit  cycles  exist,  the 
maximum  period  is  Tmax) — are  eliminated  from  as  they  are  now  known  to  be  stable.  The 
remaining  vectors  in  are  then  further  checked  for  convergence  (see  Section  B) . 

Section  A.  Consider  the  set  where 

=  {x(A;)  e  I  •  x(A:)]  =  0}  ,  (19) 

Hence,  consists  of  all  the  vectors  x(A:)  e  that  map  to  0  in  one  and  only  one  iteration 
of  equation  (7)  or  (8).  Note  that,  any  other  stable  vector  in  must  map  to  prior  to 
reaching  0.  Hence,  for  further  computations,  we  form 

^(1)  =  ^(0)  ^  y(i)  ^20) 

Note  that,  —  /C[V^^^].  In  fact,  one  immediately  notices  that  =  Tmax- 

Furthermore,  any  vector  in  which  is  mapped  to  by  (7)  or  (8)  in  one  iteration  will 
also  converge  to  0.  Hence,  we  form  the  set  where 

=  {x(A;)  e  •  x(A;)]  e  .  (21) 

Hence,  consists  of  all  the  vectors  x(A:)  €  that  map  to  0  in  exactly  two  iterations  of 
equation  (7)  or  (8).  Hence,  for  further  computations,  we  form 

5(2)  ^  ^(1)  \  y(2)  ^22) 
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Note  that,  -  K.[V^‘^^. 

Likewise,  we  get  the  following  sets:  For  L  =  1, 2  . . .  T 

/  5  ^  TTt(X3j  3 

=  {x(A:)  €  S^^-^^Q[A  ■  x{k)]  e  , 

and 

5W  ^^(L-l)  yy(L)_ 

Note  that,  /C[VW]. 


The  conditions  under  which  this  construction  is  terminated  and  their  implications  are  as  follows: 
(1)  If 

=  0,  for  some  L  =  1, 2, . . . ,  Tmax  -  1,  (25) 

all  vectors  in  are  convergent. 


(2)  If  # 

=  0,  for  some  L  =  1, 2, . . .  T^ax,  (26) 

then 

5(0  „  5(1-1)^  for  i  =  L,L+l....,T„„.  (27) 

Under  this  situation,  the  remaining  vectors  in  «S(^“^)-there  are  of  them-will  be  further 

checked  for  convergence  (see  Section  B). 

Remark.  Upon  a  little  reflection,  one  notices  that  must  either  be  empty  or  contain  one 

and  only  one  vector  from  . 


Section  B.  Although  the  reverse  mapping  procedure  outline  above  reduces  the  computational 
complexity  considerably,  it  may  not  capture  all  the  vectors  in  L  =  1, 2, . . . ,  that 

map  to  0  within  iterations.  This  is  due  to  the  fact  that,  there  may  be  vectors  in  that 

map  to  0  through  a  vector  not  belonging  to  Hence,  when  encountered  with  condition  (2)^ 

above,  convergence  of  each  remaining  vector  in  is  determined  by  checking  whether  it  is 
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mapped  to  0  in  less  than  Tmax  through  either  (7)  or  (8),  whichever  is  applicable.  This  exhaustive 
technique  is  in  fact  an  extension  of  that  given  in  [10]  to  digital  filters  represented  in  their  state- 
space  realization.  However,  we  must  emphasize  the  significant  computational  advantage  gained 
by  first  invoking  the  reverse  mapping  construction  procedure  in  Section  A. 

Assuming  condition  (2)  has  occurred,  let 

=  {xS"');  i  =  1, 2, . . . ,  .  (28) 

Note  that,  when  condition  (2)  has  occurred,  from  (27),  For  each  vector  e 

construct  the  orbit  consisting  of  all  state  vectors  for  y  =  1, 2, . . . ,  Tmax,  that 

are  consecutively  generated  by  (7)  or  (8)  (whichever  is  applicable)  with  as  the  initial  state, 
that  is,  x^^^  =  x|^^(0). 

For  each  i  =  1, 2, . . . ,  the  conditions  under  which  the  construction  of  each  orbit 

is  terminated  and  their  implications  are  as  follows: 

(1)  If 

(y )  =  0,  for  some  y  =  1 , 2, . . . ,  Tmax ,  (29) 

then  xj^^  together  with  each  vector  in  the  orbit  is  convergent. 

(2)  If 

for  jj^k,  (30) 

then  xp^  gives  rise  to  limit  cycles. 

Remark.  These  are  in  fact  the  only  conditions  that  can  occur  when  either  (7)  or  (8)  generate 
the  orbit. 

Observation. 

If  the  upper  bound  Mj  <  1  for  all  i,  we  observe  that  given  by  (18)  will  only  contain  0. 
Consider  a  digital  filter  implementation  given  by  (7),  i.e  =  1.  From  (14) 

m  oo 

eEElA«WI  =  ft<i  (31) 

i=l  k=0 
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for  i  =  1,2, . .  .m, 


If  a  sign  magnitude  quantizing  scheme  is  considered,  ^  =  0.5,  equation  (31)  can  be  written  as, 

m  oo 

(32) 

A:=0 

Therefore  we  can  conclude  that  a  digital  filter  in  double  length  accumulator  environment 
satisfying  eqn.  (32),  is  globally  asymptotically  stable. 


V  Perturbation  of  Filter  Coefficient  Matrix 


In  constructing  the  region  of  g.a.s.  in  the  coefficient  space,  perturbations  incurred  in  storing 
each  filter  coefficient  must  also  be  considered.  Such  perturbations  are  typically  due  to  finite 


wordlength  effects  that  require  rounding  or  truncation  of  the  true  coefficient  value. 


The  algorithm  described  in  the  previous  section  provides  information  regarding  g.a.s.  of  a 
given  filter  with  a  nominal  coefficient  matrix  A  =  {oy}  G  Once  this  is  done,  we  now 

consider  a  small  perturbation  Aoij  of  each  coefficient  about  its  nominal  value  aij.  However,  for 
a  given  state  vector  x(^),  this  perturbation  may  not  necessarily  alter  the  next  state  'x{k  +  1) 
obtained  since  it  is  entirely  possible  that 


^{k  +  1)  =  Q[{A  +  Ayl)  •  x{k)]  =  Q[A  •  x(A;)],  (33) 

where  AA  =  {Aa^}  G 

Depending  on  the  number  of  quantizers  per  row,  that  is,  depending  on  whether  a  double-  or 
single-length  accumulator  is  available,  (33)  is  interpreted  differently. 

Double-length  accumulator 


It  is  evident  that  the  upper  bound  Mi  estimated  for  the  nominal  value  of  the  coefficient 
matrix  {uy}  G  will  no  longer  be  valid  for  a  perturbed  system  {a^  -I-  Aa^}  G  .  We 
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define  an  upper  bound  Mi,  i —  1,2,..  .,m,  which  will  be  valid  for  the  nominal  coefficient  matrix 
and  a  perturbed  coefficient  matrix  +  Aaij}„,xm.  Consider  the  equation, 


Q 

m 

+  Aoy)  •  Xj{k) 

=  Q 

m 

53  (^) 

i  —  1,2,. . .  ,m  and  where  Xj  e  x  is  taken  from  the  set, 

‘5  =  {x|  |a;i|<Mi,  i  =  l,2,...,m;  (35) 

Due  to  the  choice  of  M,  S  is  valid  for  the  systems  described  by  (7)  with  coefficient  matrices 
{ay}mxm  and  {fly  +  Aay},„xm-  Let  Q  be  the  set  consisting  of  all  the  perturbations  Aa^-  around 
the  nominal  value  a^,  which  satisfies  equation  (34).  This  is  in  fact  the  Robustness  region 
associated  with  the  nominal  value  of  the  coefficient  matrbc  {a^}.  We  formulate  the  problem  of 
finding  the  robustness  region  in  the  following  manner. 

Any  perturbation  Aa^  satisfying  eqn.  (34)  for  all  x  e  5  will  be  in  the  set  Q. 

Since  it  is  not  possible  to  consider  an  arbitrarily  large  area  around  the  nominal  coefficient  value, 
we  use  an  analytical  method  to  make  an  estimate  for  this  region  to  which  the  algorithm  can  be 
applied.  Once  the  region  is  determined  it  will  be  covered  by  a  suitable  grid  and  eqn.  (34)  will 
be  used  to  determine  if  each  grid  point,  corresponding  to  a  particular  Aa^  in  this  estimated 
region,  is  in  fact  in  Q. 

To  proceed,  it  is  convenient  to  identify  the  discontinuities  associated  with  the  nonlinearity  Q[-]. 
For  sign-magnitude  roundoff, 

6  9?  I  =  r  -f-  T  €  2:| ;  (36) 

for  sign-magnitude  truncation  quantization, 

=  {br  e^\br  =  r,  T  E  Z  \  {0}}  ;  (37) 

for  two’s  complement  truncation  quantization, 

Atoo  =  {&r  €  3?  I  6r  =  r,  r  e  2}  .  (38) 
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For  each  x  e  <S,  a  region  corresponding  to  the  robustness  region  in  eqn.  (34)  applicable  to  the 
pertinent  quantization  schemes  in  (36),  (37),  or  (38)  is  defined.  Let  the  region  corresponding  to 
the  i-th  state  Xi  of  x  be  Then,  we  have  the  following: 

For  sign-magnitude  roundoff  quantization, 

'  {Acyl  br-i  -  ^ijXj  <  'ZjLi  ^aijXj  <  K  -  EjLi  dijXj}  '' 

for  br-i  <  dijXj  <  bj.  and  r  >  1 

Q{i)  _  ,  {AOy  I  ^r— 1  OtijXj  <  ^ClijXj  ^  b^ 

^  for  br-i  <  Z)^i  dijXj  <  br  and  r  <  —  1 

{AOjjl  6_i  X/j=l  O-ijXj  •<  ^2j=l  ^dijXj  <60  0,ijXjy 

for  b-i  <  Y.jLi  a-ijXj  <  bo 

where  br  EVr]  (39) 

for  sign-magnitude  truncation  quantization. 


{ACyl 

br- 

1  “ 

^ij^j  —  ^j=l 

AdijXj 

< 

br- 

~  ^j=l 

aijXj} 

for 

br 

-1  <  YJjLi  aijXj  <  br 

■  and  r 

> 

2 

■{  AOy  1 

hr- 

1  “ 

AdijXj 

< 

br- 

■ 

CLijXj  }■ 

for 

br 

“1  ^^j=l  ^ij^j  ~ 

■  and  r 

< 

-1 

{AUyl 

6-1 

— 

EJLi  o-ij^j  <  'E'jLi  AdijXj 

<  i 

El- 

Y^m 

GijXj^ 

for  6_i  <  o,ijXj  <  6+1 
where  br  e 


for  two’s  complement  truncation  quantization, 

Q(i)  _  f  {Adyl  br-l  X)_;  =  l  O-ijXj  <  X^y=;l  ^dijXj  <i  br  £jz=l  (^ijXj}  1 

1  for  6r_l  <  E^l  dijXj  <  br  j 

where  br  E  Vi^o-  (41) 

For  a  particular  state  in  the  vector  x,  the  region  can  be  computed  using  eqns.(39)  ,  (40)  or  (41). 
For  all  X  G  «S  the  total  robustness  region  is  given  by  ^ 

e=nyx-  («) 
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Prom  (39)  ,  (40)  or  (41),  we  may  estimate  suitable  values  for  the  region  of  robustness  for  each 
quantization  scheme.  The  computations  involved  in  determining  the  robustness  region  for  the 
two’s  complement  quantization  scheme  is  given  below,  the  analysis  can  be  extended  for  the  other 
quantization  schemes  in  a  similar  manner.  For  the  two’s  complement  truncation  quantization 
scheme  from  eqn.(41)  ,  we  choose  Ao^  such  that, 


^  (  m  m 

E  <  inin  I  \br  -  Y^aijXjl  |6,_i  -  ^ 

(  i=i  i=i 


(43) 


i  =  1,2, . . .  ,m.  The  left  hand  side  of  (43)  will  be  given  by. 


^  /  ^OiijXj 

j=l 


i=i 


(44) 


We  estimate  the  perturbations  Aoj^  such  that,  they  satisfy  the  following  equation, 

^  (  m  m  ' 

E  <  min  1 16,.  -  I^r-i  - 


i=i 


(46) 


j  =  l 


j=l 

where  z  =  1, 2, . . , ,  m. 

Since  we  are  estimating  the  region  for  the  nominal  value  in  the  coefficient  space  we  will  initially 
take  each  xj  to  be  bounded  by  its  corresponding  Mj,  Therefore  the  above  equation  is  satisfied 
if. 


___  ^  I  ***  III, 

|Aay|-|M,|  <  min  ^  \br  -  ]E«ii®ih  l^r-i  - 

j=l 


x.es 


t  j=i  j^i 

Now  (46)  can  be  used  to  estimate  the  robustness  region  where  Q  is  given  by, 


(46) 


Q  =  <  Aaa 


_ ^  ^  ***  III 

E  <  mip{|6r  -  E%-^il>  l^r-i  -  Eay-rCjl} 

i=i  j=i 


(47) 


|j=i  xe-s 

Where  z  =  1,  2, . . . ,  m.  Clearly,  Q  C  Q. 

But  from  eqn.(47)  it  is  observed  that  in  a  degenerate  case  Q  may  only  contain  the  zero  pertur¬ 
bation  vector,  due  to  the  right  hand  side  of  eqn  (47)  being  equal  to  zero. 


Note  that  for  all  quantization  schemes  considered,  and  for  all  i. 


{m  1  1 

l^r  —  l^r-1  —  f  ^  O 

i=i  j=i  J  ^ 


(48) 
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due  to  the  distance  between  any  two  discontinuities  being  always  less  than  1. 
maximum  perturbation  region,  Q  is  given  by  the  following  set  and  it  is  seen  that 


Therefore  the 

G  eg, 


m  , 

E  \^j\\Uj\  <  - 

L7=1  ^ 


(49) 


Where  My  for  >  =  1, 2 . . . ,  m  are  the  upper  bounds  computed  for  the  nominal  value  {ay,}. 


Single-length  accumulator 


If  there  are  m  quantizers  per  row  as  in  (8),  robustness  region  is  defined  for  each  element 
in  the  following  manner: 

G  =  [(oij  +  Aaij)  •  Xj]  =  Q  [(a^-  •  Xj)] ,  Vx  e  <Si }  .  (50) 

i,j  =  1,2, . . .  ,m 

As  in  the  double  length  accumulator  implementation  we  define  an  upper  bound  Mi,  valid  foi^ 
systems  with  coefficient  matrices  {a^-}  and  +  Any}  and  described  by  eqn.  (8).  The  set  Si 
is  defined  as  follows, 

tSi  =  {x||a:,-|  <  Mi  ;  i  =  1,2, . . .  ,171;  x  e  2'"}  (51) 

Let  the  robustness  region  corresponding  to  element  a^-  in  the  coefficient  matrix  for  a  particular 
state  vector  x  be  Then,  we  have  the  following: 


For  sign-magnitude  roundoff  quantization, 


{AaijI 

^r-l  — 

Q/jjf  *  Xj  ^  j^OLij  • 

Xj  ^  br  aij 

■Xj} 

for  br- 

1  ^  dij  •  Xj  <  br 

and  r  >  1 

{Aa^l 

br-l- 

dij  •  Xj  ^.CLij  ' 

Xj  br  aij 

■Xj} 

for  br- 

1  ^  ^ij  * 

and  r  <  — 1 

{  Aoij  1 

6—1  Qj^j  •  Xj  <C  ^dij  •  Xj  6q  —  Gij  • 

Xj} 

for  6_i 

<  aij  •  Xj  <  bo 

where 

br  G  Pri  Vx 

G  <Si; 
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for  sign-magnitude  truncation  quantization, 


'  {Atty  I  6r-i  —  Osij  •  Xj  <  Aaij  •  Xj  <  K  —  Oij  -  xj} 

for  br-i  <  ttij  •  Xj  <  br  and  r  >  2 


{Attij 


bf‘ — j  ^ij  *  ^j  *  ^j  bf  ^ij  * 

for  br-i  <  ttij  •  Xj  <  br  and  r  <  — 1 


> 


{AaijI  6_i  —  ttij  •  Xj  <  Aaij  •  Xj  <  b+i  —  aij  •  Xj} 

for  b_i  <  aij  •  Xj  <  b^i 

where  br  E  T>mt  Vx  €  Si  (53) 


For  two’s  complement  truncation  quantization, 

Qiyj)  /  I  ^r— 1  ^ 

^  (  for  br-i  <  aij  •  Xj  <  br 

where  br  €  Vx  G  .Si.  (54) 

Hence  g.a.s.  can  be  gaurenteed  for  the  region 


s  =  n  (55) 

V(tJ) 

Using  a  similar  argument  as  in  the  case  of  a  double-length  accumulator,  we  can  estimate  a  region 
of  robustness  for  each  quantization  scheme  using  eqns.  (52),  (53)  or  (54).  The  computations 
involved  for  the  two’s  complement  case  is  outlined  below.  The  perturbation  is  seen  to  satisfy 
the  equation, 

AaijXj  <  min  {|6r  —  |6r-i  —  >  for  all  i,j  (56) 

xe5i 

|Aaija;j|  =  |Aaij|  •  |xj|  and  the  right  hand  side  of  eqn.  (56)  satisfies  (48)  therefore  it  can  be 
rewritten  in  the  following  form, 

|Aay|  •  \xj\  <  —  (57) 

Since  we  are  only  interested  in  finding  an  estimate  for  the  region  around  the  nominal  value  of 
the  coefficient,  any  Xj  is  bounded  by  it’s  corresponding  Mi.  The  estimated  region  for  the  two’s 
complement  quantization  will  consist  of  any  perturbation  satisfying  th  equation. 


=  jAaij 


|Aaij||iUi|  2  ’  —  1)2,..., 
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(58) 


The  region  Q  can  be  covered  by  a  suitable  grid  and  the  grid  points  are  applied  to  eqn,  (50)  to 
obtain  the  stable  region  around  the  nominal  point. 

We  note  that  if  Mi  <  1  for  all  i  then  the  perturbations  Aa^  {i,j  =  1, 2, . . . ,  m)  can  take  any 
value. 


VI  Some  Examples 


In  this  section  the  proposed  search  algorithm  is  applied  to  a  dense  grid  in  the  coefficient  space  to 
obtain  the  total  global  asymptotic  stability  region  for  a  digital  filter  with  zero  input.  The  dense 
grid  will  provide  a  reasonably  good  approximation  to  the  g.a.s  region,  since  it  is  not  possible  t<| 
consider  all  points  in  the  linear  stability  region.  Note  that  each  point  in  the  coefficient  space 
is  associated  with  a  neighborhood  where  the  filter  is  stable.  A  10  Bit  wordlength  is  assume  for 
all  computations,  therefore  the  filter  coefficients  are  quantized  to  a  multiple  of  Within 

the  linear  stability  region  dark  areas  indicate  points  where  limit  cycles  of  some  period  exists. 
It  should  be  noted  that  the  linear  stability  region  does  not  have  a  common  boundary  with  the 
global  asymptotic  stability  region  obtained  through  this  algorithm.  Therefore  in  all  figures,  the 
boundary  line  which  delimits  the  stability  region  from  the  unstable  region  does  not  belong  to 
the  stability  region. 


The  most  commonly  encountered  quantization  schemes  are  analyzed,  they  are  namely,  sign 
magnitude  roundoff  quantization  scheme,  sign-magnitude  truncation  quantization  scheme  and 
the  two’s  complement  truncation  quantization  scheme.  In  all  quantization  schemes  the  single- 
and  the  double-length  accumulator  implementation  results  are  provided.  All  results  are  provided 
for  the  {ay}  e  coefficient  matrix.  All  existing  results  for  the  named  quantization  schemes^ 


were  verified. 


Results  for  direct  form  realization  of  digital  filters 


For  a  direct  form  digital  filter  in  state  space  formulation  (the  coefficient  matrix  is  given  by 
eqn.(59)) 


A  = 


0  1 
0,2  Oi 


(59) 


Figure.(l)  shows  the  region  obtained  by  the  proposed  algorithm  the  sign  magnitude  roundoff 
quantization  scheme  in  an  double  length  accumulator  environment. 


The  region  obtained  is  identical  to  the  results  given  in  [10].  For  the  same  quantization 
scheme  and  single  length  accumulator  the  region  obtained  is  given  in  Figure.  (2).  The  region 
matches  exactly  with  the  ones  found  in  [10].  The  regions  for  the  two’s  complement  and  the 
sign  magnitude  truncation  schemes  were  also  verified.  The  region  for  the  two’s  complement 
truncation  quantization  in  the  single  length  accumulator  environment  is  shown  in  Figure  (3) 
.Note  that  there  is  a  graphical  error  in  the  region  given  in  [10].  All  other  regions  obtained  by 
the  proposed  algorithm  matches  with  the  regions  given  in  [10]. 

Results  for  minimum  norm  realization  of  digital  filters 


The  stability  of  digital  filters  in  its  minimum  norm  form  for  the  coefficient  matrix,  A  E 
case  was  also  investigated.  The  coefficient  matrix  is  given  by  eqn.(60). 


A  = 


a  u) 
—uj  a 


(60) 


The  results  for  the  sign  magnitude  roundoff  scheme  for  the  single-  and  the  double-length  accu¬ 
mulator  environment  is  given  in  Figure.  (4)  This  region  matches  with  the  region  given  in  [7].  The 
stable  region  for  the  sign  magnitude  truncation  scheme  in  a  single  length  or  double  length  accu¬ 
mulator  environment  spans  the  entire  region  where  <  1.  results  are  given  in  Figure.  (5). 

This  supports  previously  known  results. 


For  the  two’s  complement  truncation  quantization,  with  double  length  accumulator  the  global 
asymptotic  region  is  given  in  Figure.  (6).  This  supports  and  also  improves  on  the  previously 
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known  results  given  in  [19].  To  the  authors  knowledge  no  previous  results  are  available  for  the 
two’s  complement  truncation  quantization  in  a  single  length  accumulator  environment.  The 
region  of  global  asymptotic  stability  is  summarized  in  Figure.  (7). 

Note  that  for  the  Two’s  complement  quantization  scheme  in  a  double  length  accumulator 
environment,  series  of  points  extend  from  the  stability  region  into  the  instability  region  such 
that, 


cr  <  0  and  co  =  ±<t 


(61) 


The  following  coefficient  matrix  can  be  cited  as  an  example. 


>1  = 


672  672 

1024  1024 


672  672 

1024  1024 


(62) 


This  series  of  points  can  only  be  observed  by  magnifying  the  area  concerned.  Sub  figures 
shown  in  Figure  (6)  shows  these  areas  magnified. 


Robustness  regions 


Some  examples  of  the  robustness  regions  computed  using  the  theory  outlined  in  Section  V 
is  given  below. 

The  robustness  region  for. 


0 


A  = 


102 
L  1024 


1 

102 

1024 


(63) 


for  the  sign  magnitude  roundoff  quantization  scheme  in  double-  and  single-  length  accumulator 
environments  are  given  in  Figure. (8)  and  Figure. (9)  respectively.  The  robustness  region  asso 
ciated  with  the  coefficient  matrix  given  by  eqn.  (62)  for  the  two’s  complement  quantization^ 
scheme  under  double  length  accumulator  environment  is  given  in  Figure. (10). 


VII  Conclusion 


A  new  algorithm  capable  of  determining  global  asymptotic  stability  of  any  fixed  point  digital 
filter  represented  in  its  state  space  formulation,  imder  zero  input  conditions  has  been  presented. 
The  search  algorithm  is  independent  of  the  type  of  nonlinearity,  the  number  of  nonlinearities 

and  it  has  been  generalized  to  handle  a  digital  filter  of  order  m  in  its  state  space  represented 
form. 

The  proposed  algorithm  is  found  to  provide  tighter  bounds  on  the  amplitude  of  limit  cycles 
m  most  cases,  and  it  will  always  determine  the  stability  or  instability  of  a  particular  digital 
filter.  Significant  improvement  over  the  existing  results  for  the  two’s  complement  truncation 
schemes  in  both  single-  and  double  length  accumulator  environments  have  been  presented. 

The  current  research  is  directed  towards  the  following  problems. 

(1)  Establishing  regions  within  which  limit  cycles  of  a  pre-specified  period  exists. 

(2)  Establish  regions  within  which  limit  cycles  that  are  under  a  pre-specified  bound  exist. 

(3)  Extension  of  the  algorithm  for  5-operator  formulated  systems.  In  Fixed-point  arithmetic 
it  is  known  that  such  systems  always  exhibit  limit  cycle  behavior  [20].  Therefore  in  actual 
applications  the  regions  similar  to  the  ones  mentioned  in  items  (1)  and  (2)  may  be  of  importance. 
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Figure  1:  Repon  where  a  direct  form  digital  filter  is  free  of  limit  cycles  for  the 
en^roaTeat  “  a  double  leagth  accumulator 
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Figure  3:  Regioa  where  a  direct  form  digital  filter  is  free  of  limit  cycles 

for  the  wo  s  complement  truncation  quantization  sdieme  in  a  single  length 
accumulator  environment.  lengtn 
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to  0 


-0.5 


Figure  5:  Region  where  a  filter  represented  in  minimum  norm  form  is  free  of 
imit  cycles  for  sign  magnitude  truncation  quantization,  in  double  and  single 
length  accumulator  environments.  ^ 


Figure  6:  Region  where  a  filter  represented  in  minimum  norm  form  is  free  of 
limit  cycles  for  two’s  complement  truncation  quantization,  in  a  double  length 
accumulator  environment. 


Figure  7:  Region  where  a  filter  represented  in  minimum  norm  form  is  free  of 
limit  cycles  for  two’s  complement  truncation  quantization,  in  a  single  length 
accumulator  environment. 


aS  aLl  robustness  region  (b)  Actual  robustness  region,  for 

enSlme^r  “  a  double  length  accumulator 
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Figure  10:  (a)  Estimated  robustness  region, 
for  the  coefficient  matrix  given  in  eqn.  (62). 


(b)  Actual  robustness  region, 
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ABSTRACT 

In  this  paper,  the  problem  of  global  asymptotic  stability  of  5-operator  formulated  one¬ 
dimensional  (1-D)  and  multi- dimensional  (m-D)  discrete-time  systems  is  analyzed  for  the 
case  of  fixed  point  implementations.  It  is  shown  that  the  free  response  of  such  a  sys¬ 
tem  tends  to  produce  incorrect  equilibrium  points  if  conventional  quantization  arithmetic 
schemes  such  as  truncation  or  rounding  are  used.  Explicit  necessary  conditions  for  global 
asymptotic  stability  are  derived  in  terms  of  the  sampling  period.  These  conditions  demon¬ 
strate  that,  in  almost  all  cases,  fixed-point  arithmetic  does  not  allow  for  global  asymptotic 
stability  in  5-operator  formulated  discrete-time  systems  that  use  a  short  sampling  time. 
This  is  true  for  the  1-D  as  well  as  the  m-D  case. 
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I.  INTRODUCTION 


Discrete- time  systems  formtdated  in  terms  of  the  incremental  difference  operator  (or,  6- 
operator)  have  recently  been  receiving  considerable  attention  in  the  technical  literature  [1- 
4].  Most  of  this  work  focuses  on  the  superior  performance  of  the  <5-operator  under  fi¬ 
nite  wordlength  conditions  when  compared  with  the  shift-operator  (or,  ^-operator).  In 
particular,  investigations  of  coefficient  sensitivity  and  quantization  noise  properties  have 
revealed  that  (^-operator  formulations  usually  perform  significantly  better  than  their  g- 
operator  counterparts  [1-4].  This  is  especially  true  for  high-speed  applications  where  the 
sampling  rate  is  much  larger  than  the  underlying  system  bandwidth.  Under  these  condi¬ 
tions,  g-operator  formulated  discrete-time  systems  tend  to  become  ill-conditioned  [1-2]. 

Although  a  large  amount  of  work  is  available  on  the  effects  of  coefficient  sensitivity  and 
quantization  noise,  a  deterministic  study  of  the  nonlinear  behavior  of  discrete-time  sys¬ 
tems  formulated  with  the  <5-operator  has  not  been  undertaken.  In  the  case  of  floating¬ 
point  (FLP)  arithmetic,  some  results  for  feedback  system  are  available  in  [2]. 

In  this  work,  we  focus  on  the  convergence  behavior  of  the  unforced  system  response  and 
global  asymptotic  stability  of  5-operator  formulated  discrete-time  systems  implemented  in 
fixed-point  (FXP)  arithmetic.  In  particular,  via  necessary  conditions  for  stability,  it  will 
be  shown  that  such  systems  tend  to  produce  DC  limit  cycles.  We  will  also  perform  a 
deterministic  analysis  of  the  finite  wordlength  properties  of  multi-dimensional  5-operator 
implemented  discrete  time  systems.  The  stability  behavior  in  the  m-D  case  has  not  been 
previously  investigated,  although  convergence  to  the  true  equilibrium  point(s)  is  one  of  the 
most  fundamental  requirements  for  any  discrete  time  system  realization. 

The  structure  of  this  article  is  as  follows:  In  Section  II,  we  introduce  notation  and  nomen¬ 
clature  for  the  1-D  case.  The  model  for  1-D  5-operator  formulated  discrete-time  systems, 
with  and  without  quantization  nonlinearities,  is  briefly  discussed.  Section  III  addresses 
the  problem  of  asymptotic  stability  for  the  1-D  case.  In  terms  of  ensuing  DC  limit  cy¬ 
cles,  necessary  conditions  for  global  asymptotic  stability  are  formulated.  It  is  shown  that, 
when  FXP  arithmetic  is  used,  stability  of  the  linear  system  is  often  lost.  Bounds  on  the 
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size  of  the  deaxibands  are  also  provided.  In  section  IV,  the  multidimensional  case  is  investi¬ 
gated  using  sets  of  1-D  conditions  for  asymptotic  stability.  Section  V  provides  concluding 
remarks. 


n.  NOTATION  AND  NOMENCLATURE 

Since  our  focus  is  the  investigation  of  stability  properties  of  ^-operator  formulated  discrete- 
time  systems  under  iinforced  conditions,  the  state  equations  of  the  system  imder  zero-input 
will  be  considered. 

In  the  linear  case,  the  general  m-th  order  state-space  representation  is  given  by 

6[x](n)  =  A^x(n);  (1) 

x(n  -H  1)  =  x(n)  +  A  •  6[x](n),  (2) 


where  x(n)  =  [xi(n), ...,xm(r»)]^  is  the  state  vector  at  instant  n,  ,A^  =  is  the 

system  matrix,  and  A  >  0  is  the  sampling  time.  Moreover,  ^[•]  represents  the  6-operator, 
that  is. 


r  .  x,/(n -f- 1)  -  ®i/(n) 

6[x„](n)  =  — i Vi/  = 


(3) 


and  6[x](n)  =  [6[xi](n),...,6[xm](n)]'^.  A  6  -system  is  stable,  if  and  only  if  the  following 
condition  on  the  eigenvalues  of  the  matrix  is  satisfied  [1]: 


Therefore  a  stable  system  matrix  cannot  be  defective,  i.e.  it  cannot  have  a  zero  eigenvalue. 

The  actual  implementation  of  (1)  and  (2)  in  FXP  format  gives  rise  to  nonlinear  quan¬ 
tization  operations  that  occur  at  various  locations  depending  on  the  hardware  realization. 

Eqn.  (1)  can  be  implemented  either  by  using  single  wordlength  accumulators  (creating 
a  quantization  error  after  each  multiplication)  or  by  using  double  wordlength  accumula¬ 
tors  (creating  a  quantization  error  only  after  summation).  We  will  only  consider  the  latter 
option  since  practically  all  modern  DSP  machines  offer  double  precision  accumulators. 
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Eqn.  (1)  can  then  be  written  as 


6[x](n)  =  g{A*x(n)},  (4) 

where  <5  is  a  vector-valued  quantization  nonlinearity  of  the  form 

/QM\ 

Q{^}  =  :  •  (5) 

Here,  Q{xj,}  can  denote  magnitude  truncation,  two’s  complement  truncation,  or  rounding. 
Eqn.  (2)  can  be  implemented  in  two  different  ways: 

x(n  -I-  1)  =  x(n)  +  Q{A  •  <5[x](n)},  (6) 

or 

x(ii  +  1)  =  <?{x(ri)  -H  A  •  (5[x](n)}.  (7) 

Eqn.  (6)  corresponds  to  quantization  after  multiplication  while  (7)  corresponds  to  quan¬ 
tization  after  summation.  In  contrast  to  (1),  for  equation  (2),  it  is  not  clear  which  of 
the  two  quantization  schemes  in  (6)  and  (7)  is  preferable.  We  will  therefore  consider  both 
possibilities. 

Throughout  this  paper,  we  will  use  the  following  definition  of  stability: 

Definition.  The  discrete-time  system  in  (4,6)  or  (4,7)  is  globally  asymptotically  stable  if  and 
only  if,  for  any  initial  condition  x(0),  the  state  vector  x  asymptotically  reaches  zero,  that 
is,  x(n)  — ►  0  for  n  —*  CO. 

Comment.  Since  the  FXP  systems  considered  axe  in  fact  finite  state  machines,  the  condi¬ 
tion  x(rj)  — »•  0  for  n  —>■  CO  may  be  strengthened  to  x(iV)  =  0  for  some  finite  N  [5]. 

The  following  additional  symbols  will  be  used: 

/:  quantization  step  size 

fijJL*  Vector  with  all  elements  being  zero  or  one,  respectively. 
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Int(x):  the  largest  integer  function,  i.e.  the  largest  integer  smaller  than  or  equal  to  x. 

Deadbands  in  terms  of  the  incremental  difference  vector  for  magnitude 
truncation,  rounding  and  two’s  complement,  respectively. 

Deadbands  in  terms  of  the  state  vector  x  for  magnitude  truncation, 

rounding  and  two’s  complement  truncation,  respectively. 

corresponding  deadband  for  the  unquantized  difference  vector. 

largest  hypercube  embedded  in 

smallest  hypercube  embedding  . 


m.  NECESSARY  CONDITIONS  FOR  GLOBAL  ASYMPTOTIC  STABILITY 
III.l  DC  Limit  Cycles 

First,  we  will  consider  the  system  described  by  (4,6).  From  the  definition  for  global  asymp¬ 
totic  stability  as  stated  in  the  previous  section,  it  is  necesseury  that 


Q{A  -  5[x](n)}  ^  0,  for  any  x(n)  5^  0. 


(8) 


This  is  just  one  of  a  finite  set  of  conditions  that  is  required  to  ensure  global  asymptotic 
stability  of  a  FXP  implementation  of  a  linearly  stable  system  [5]. 

The  following  theorem  on  global  asymptotic  stability  of  delta-operator  formulated  discrete 
time  systems  provides  conditions  on  the  sampling  time: 


Theorem  1.  A  necessary  condition  for  global  eisymptotic  stability  of  the  6-operator  formu¬ 
lated  discrete-time  system  in  (4,6)  is  A  >  0.5  for  rounding  and  A  >  1  for  truncation. 

Proof:  At  first,  we  will  address  the  case  of  magnitude  roimding:  The  necessary  condition 
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for  global  asymptotic  stability  (8)  is  violated,  if 


and  5[x](n)  7^  0.  With 


we  can  rewrite  (9)  as 


I  A  •^[z,,](n)  |<  -  for  = 


^[zv](n)  =  /  for  1/  =  !,••■  ,m, 


A<i. 


(9) 


(10) 


(11) 


If  the  sampling  time  is  chosen  according  to  (11),  then  condition  (9)  is  satisfied  and  hence, 
the  system  will  exhibit  a  period  one  limit  cycle.  Therefore,  in  order  to  avoid  a  period  one 
limit  cycle  we  require 

^  ^  (12) 

(Additional  constraints  will  have  to  be  imposed  on  A  in  order  to  guarantee  the  absence  of 
limit  cycles  with  a  period  other  than  one.)  This  proves  the  Theorem  for  rounding. 

In  the  case  of  magnitude  truncation,  equation(9)  becomes: 


A  •  5[zy](n)  |<  /  for  = 


m 


(13) 


with  5[x](n)  9^  0  With  (13)  and  (10),  one  arrives  at  the  following  condition,  which  excludes 
period  one  limit  cycles: 

^  >  1  (14) 

For  two’s  complement  truncation,  equation  (9)  takes  the  form: 


0  <  A  •  5[x:,](n)  <  / 


(15) 


Together  with  (10),  the  above  equation  also  results  in  (14),  which  proves  the  Theorem. 


The  above  theorem  shows  that  high-speed  ^-operator  formulated  implementations  that 
possess  a  small  sampling  time  cannot  be  realized  Hmit  cycle  free  in  FXP  format!  Since  the 
advantages  of  delta-operator  systems  with  respect  to  coefficient  sensitivity  and  quantization 
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noise  require  a  short  sampling  time  much  smedler  than  one,  this  requirement  cannot  be 
met  if  limit  cycles  have  to  be  avoided. 

A  second  necessary  condition  for  the  system  in  {(4),  (6)}  can  be  obtained  by  noting  that 

(5[x](n)  =  0  (16) 


can  occur  in  (4)  even  though  the  state  vector  x(n)  0. 


Therefore,  for  magnitude  rotmding,  no  nonzero  state  vector  x(n)  that  belongs  to  the  quan¬ 
tization  lattice  and  satisfies 


<  ■  x(n)  <  -f- 


(17) 


may  be  allowed  to  exist.  In  (17),  the  inequality  has  to  hold  elementwise. 


Equation  (17)  has  the  following  geometric  interpretation: 


Each  of  the  resulting  m  inequalities  can  be  geometrically  interpreted  in  the  state  space  as 
the  intersection  of  two  half  spaces  in  SS"*.  These  intersections  are  symmetric  about  the 
origin  and  have  parallel  boundaries.  The  normal  vector  to  the  boimdaries  is  given  by  the 
particular  row  vector  of  .  Only  if  the  intersection  of  all  such  m  half  spaces  contains  at 
least  one  nonzero  point  in  3?”*  on  the  quantization  lattice,  will  there  exist  a  nonzero  state 
vector  that  is  an  equilibrium  point  of  the  system  due  to  equation  (16).  Since  we  only 
consider  A^  matrices,  which  axe  stable,  the  system  matrix  A^  is  always  invertible.  One 
can  therefore  rewrite  (1)  to  obtain  a  sufficient  condition  for  the  existence  of  non- zero  state 
vectors,  which  are  eqmlibrium  points  due  to  equation  (16): 


x(n)  =  (A^)-^6[x]{n)  with  5[x](n)  €  (-1/2, 1/2)"' 


(18) 


In  order  to  obtain  bounds  for  each  of  the  components  of  x(n)  we  use  the  infinity  norm: 


il  xW  lloo<||  (A^)-^  llooll  6[x](n)  ||oo<||  (A«)-1  He 


(19) 
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The  peraUelepiped  described  by  (18)  is  therefore  imbedded  in  the  hypercuboid  described 
by  (19).  If  (19)  does  not  permit  any  points  x(n)  of  the  sampling  lattice,  instability  due  to 
(16)  cannot  occur.  From  (19),  this  is  the  case  if 


II  (A')-i  ||oo<  2.  (20) 

Eqn.  (16)  can  also  be  interpreted  from  an  eigenvalue/eigenvector  viewpoint.  In  high-speed 
digital  filters  where  the  sampling  frequency  is  typicaUy  much  higher  than  the  bandwidth 
of  the  processed  signal,  the  eigenvalues  of  a  g-operator  implementation  cluster  around  the 
point  z  _  1  [1],  The  corresponding  (5-operator  implementation  for  large  sampling  times  has 
eigenvalues  clustered  around  zero.  However,  as  the  sampling  time  becomes  small,  these 
eigenvalues  move  towards  the  eigenvalues  of  the  underlying  continuous- time  system  [1].  In 
other  words,  for  large  sampling  times,  the  system  matrix  will  be  ill-conditioned,  that  is, 
vectors  x(n)  ^  o  exist  such  that  ■  x(n)  is  close  to  the  zero  vector.  According  to  (16), 
this  is  hkely  to  cause  a  DC  limit  cycle.  For  small  sampling  times,  this  problem  may  not 
occur;  however,  in  this  case,  the  conditions  in  Theorem  1  are  not  satisfied  and  the  system 
is  already  known  to  produce  limit  cycles. 


In  the  case  of  the  remaining  two  quantization  schemes,  the  inequalities  corresponding 
to  (17)  are  given  below:  For  two’s  complement  truncation. 


0<A^.x(n)<  :  ,x(n)9£0. 


(21) 


and,  for  magnitude  truncation. 


• 

<  •x(n)  <  +  I  • 

.  3c(n)  ^  0. 

(22) 

\J 

Again,  the  above  inequalities  have  to  interpreted  elementwise.  The  embedding  hypercubes 

can  be  constructed  for  the  peraUelepiped  in  (21)  and  (22)  in  a  similar  fashion  as  for 
rounding  in  (18). 


So  far,  we  only  addressed  the  system  described  by  (4,6).  A  similar  analysis  can  be 
conducted  for  the  system  in  (4,7).  Since  (4)  is  common  to  both  realizations,  equations 
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(17,21,22)  Jire  still  valid  and  provide  conditions  under  which  the  finite  difference  is  quan¬ 
tized  to  zero  and  a  DC  limit  cycle  is  produced.  We  will  now  briefly  discuss  necessary 
conditions  for  global  asymptotic  stability  obtained  from  (7). 

A  period  one  limit  cycle  exists,  if  the  condition 


X  =  g(x  +  A6[x](n)) 


(23) 


is  satisfied  for  x  0.  Using  a  similar  argument  as  in  the  proof  of  Theorem  1,  for  roimding, 
equation  (23)  is  satisfied  if: 


/ 

2 

<  A6(x,,](n)  <  ^ 

for 

o 

A 

(24) 

/ 

2 

<  A%.,](n)  <  ^ 

for 

2?i/  ^  0 

(25) 

1 

2 

<  A5[x„](n)  <  ^ 

for 

Xi/  =  0 

(26) 

Therefore 

A  >  i  (27) 

is  required  to  exclude  period  one  limit  cycles. 

For  magnitude  truncation,  (23)  is  satisfied,  if 


0  <  A6[x,,](n)  <  / 

for  Xu  >  0 

(28) 

— /  <  A6[x:,](n)  <  0 

for  Xu  <  0 

(29) 

-/  <  A%,,](n)  <  / 

for  Xu  =  0 

(30) 

■  ,m 

In  the  case  of  two’s  complement  truncation,  the  condition  for  a  DC  limit  cycle  is  simply 
given  by 


0  <  A<5[a:„](n)  < /,  = 


(31) 
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The  conditions  (28-30)  and  (31)  again  result  in  the  condition  A  >  1  for  the  absence  of 
period  one  limit  cycles. 

We  therefore  obtain  almost  the  same  conclusion  as  for  the  previously  considered  system: 

A  >  i  for  magnitude  roxmding; 

A  >  1  for  truncation. 

Therefore,  Theorem  1  also  holds  for  the  system  representation  in  {(4),  (7)},  if  the  condition 
for  rounding  is  slightly  changed  to  A  >  |. 

Upto  now,  we  provided  necessary  conditions  for  stability  of  delta-operator  fomulated 
discrete  time  systems  in  fixed  point  arithmetic.  Since  it  has  been  established,  that  for 
small  sampling  periods,  the  delta-operator  systems  always  exhibits  period  one  limit  cycles, 
one  needs  to  examine  the  amplitude  of  these  limit  cycles  for  a  given  sampling  time  in  order 
to  obtain  further  insight  into  the  practical  impact  of  this  problem.  In  what  follows,  bounds 
on  the  deadbands  will  be  derived  sis  a  function  of  the  -matrix  and  the  sampling  time  A. 

III. 2  Deadband  Bounds 

This  subsection  provides  an  answer  to  the  question  of  the  size  of  the  limit  cycle 
amplitudes.  Given  a  sampling  time  A  and  a  system  matrix  bounds  for  the  deadbands 
as  well  as  the  deadband  geometry  will  be  described.  This  will  be  done  in  detail  for  the 
case  of  magnitude  truncation.  For  magnitude  rounding  and  two’s  complement  truncation, 
the  results  will  be  stated  briefly  without  proof.  Since  the  results  for  the  system  (4,7)  are 
very  similar  to  the  results  for  the  system(4,6),  this  subsection  focuses  only  on  the  latter. 

For  each  quantization  scheme,  we  will  provide  the  geometry  of  the  deadband  in  terms 
of  the  incremental  difference  vector  as  well  as  the  state  vector.  Two  hypercubes,  which 
bound  the  deadband  region  from  the  inside  and  the  outside  are  also  derived  for  each  case. 

Theorem  2: 

For  the  system  (4,6)  implemented  in  magnitude  trxmcation,  the  deadband  (in  terms  of 
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period  one  limit  cycles)  in  the  incremental  difference  vector  space  is  given  by: 


=  II  %]||oo<[/n<(A-^  )-!]•/}  for  Int{A-^)  =  A-^  (32) 

and 

=  |U[x]||oo</n<(A-i)./}  for  Int{A-^)  A-K  (33) 

The  corresponding  period  one  limit  cycle  deadband  in  the  state  space  is  given  by 

2?f^  =  {x|x=(A«)-i^[x],  J[x]€-4f^}  (34) 

where 

^f^  =  {£[x]|  II  6[x]||oo<[/n<(A-l )  +  !]./}  for  7n<(A-l)  A"'  (35) 

and 

•^5^^  =  {i^lx]  I  II  5[x]  ||oo<  Int(A~^)  ■  /}  for  Int(A~^)  =  A~^  (36) 

Proof: 

The  proof  will  be  carried  out  for  /ti<(A~^)  A“^,  since  the  ca^e  Int(A~^)  =  A“^  follows  in  a 

similar  fashion.  From  (13),  the  expression  for  period  one  limit  cycles  can  be  expressed  as 

II  A<5[x](n)||oo</.  (37) 

Solving  (37)  for  5[x]  and  considering,  that  5[x]  produced  by  equation  (4)  is  an  integer 
multiple  of  the  quantization  step  /,  one  obtains 

ll«”)lloo</n<(A-i)-/  (38) 

for  /n<(A“^)  A~^  which  is  the  hypercube  in  (33).  Now  consider  the  following  slightly 

larger  hypercube  in  5[x]: 

^  =  {6[x]  I  II  S[x]  ||oc<  [/n<(A-i)  +  1]/}  (39) 

describes  the  open  set  of  all  incremental  difference  vectors,  which,  after  quantization 
will  be  mapped  into  the  hypercube  7?^'^  ,  i.e. 

<3(^M)  €  V5[x]  €  A^'^. 
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Therefore  the  deadband  in  terms  of  x  can  simply  be  found  by  determining  the  set  of  all  x, 
which  satisfy 

A^x.  e 

Since  was  assumed  to  be  linearly  stable,  it  is  also  invertible  .  Therefore  the  deadband 
in  the  state  space  is  obtained  by 

Pf  ^  =  {x  1  X  =  {A^r^6[xl  5[x]  € 

This  completes  the  proof  for  /nt(A-^)  ^ 


The  following  Corollary  provides  the  largest  hypercube  in  the  state  space,  which  is  con¬ 
tained  in  the  perallelepiped  This  result  allows  to  obtain  the  largest  magnitude  of 

state  vector  components,  which  can  still  belong  to  the  deadband.  It  also  provides  a  simple 
upper  bound  on  the  volume  of  the  deadband. 


Corollary  3: 

The  largest  hypercube  embedded  in  P^^  is  given  by: 

=  for  (40) 

and  by 

=  l|x(n)|U<M^ij^}  for  /nf(A-^)  =  A-i  (41) 

Proof: 

Assume  /n<(A~^)  A“^.  From  (1)  we  obtain  for  the  unquantized  incremental  difference 

vector: 

l|5Ml|oo<||  AMIooUxIIoo  (42) 

Since  describes  the  set  of  unquantized  difference  vectors,  which  after  quantization 

maps  into  the  deadband  region  one  can  use  the  right  side  of  (42)  to  ensure,  that 

equation  (39)  is  satisfied  and  obtain: 

II  ||oo||  X  ||oo <  [/m<(A  ^ )  +  1]  -  /  (43) 
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Solving  (43)  for  ||  x  Hoo  produces  the  desired  result.  Since  is  a  hypercube  centered  at 
the  origin,  there  exists  axe  such  that 

IK[x]||oo=1MN|co-||x||oo.  (44) 

Hence  this  is  the  largest  such  hypercube.  The  proof  for  the  case  /n<(A“^)  =  A~^  follows 
from  (43)  in  a  similar  fashion. 


The  next  Corollary  provides  the  smallest  hypercube  in  the  state  space,  which  still 
contains  This  provides  a  lower  boxmd  on  the  volume  of  the  deadband: 

Corollary  4: 

The  smallest  hypercube  'H^'^  containing  is  given  by 

«u’’  =  {x|  II  x||oo<il(A^)-Mloo(/ni(A-i) +  !)•/}  for  /nf(A-i)  A"^  (45) 

and 

W"^  =  {x|  ||xl|oo<||(A^)-M|oo/n<(A-l)./}  for  /n<(A-l)=A-l  (46) 

Proof: 

At  first  consider  the  case  Int(A~^)  ^  From  (1)  we  have  for  the  imquantized  state 

vector: 

x  =  (A«)-^5[x]  (47) 

Taking  norms  and  using  the  inequality  in  (35),  we  obtain  the  following  open  hypercube, 
which  contains  : 

II  X  llooSlI  (A^)  '  llooll  ^[x]  ||co<||  (A^)  ^  ||oo  [/fi<(A“^)  +  1]  •  / 

Since  is  a  hypercube  centered  at  the  origin,  there  exists  a  (5[x],  such  that 

II  llooll  «[x]||oo=||x||oo. 

Hence  is  the  smallest  such  hypercube.  The  proof  for  the  case  /n<(A~^)  =  A~^  is 

identical  and  requires  the  use  of  (36)  instead  of  (35). 
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Remarks: 


1.  Since  ||  (A^)  ^  ||  •  ||  (A^)  ||>  1,  we  have  C  For  matrices  which  satisfy 


11  (A^)-i  11^  .  II  A^  |loo=  1  (48) 

the  two  hypercubes  axe  identical  and  coincide  with  the  deadband  region  i.e. 

2.  is  a  closed  m-D  hypercube,  centered  around  the  origin.  Its  boundary  coincides 
with  points  of  the  quantization  lattice.  The  faces  of  the  hypercube  are  orthogonal  to 
the  corresponding  ajcis  of  the  incremental  difference  vector  space. 

3.  is  an  open  parallelepiped.  describes  the  deadband  for  period  one  limit 

cycles  in  terms  of  the  state  vector. 

4.  The  total  deadband  includes  the  region  for  the  incremental  difference 

vector  (the  state  vector.) 

5.  A  useful  measure  of  the  deadband  size  in  terms  of  the  state  vector  x  is  the  volume  in 
the  state  space.  The  Volume’  Vols  of  the  deadband  in  (5[x]  is  easily  computable  due  to 
the  hypercube  geometry.  From  (47)  we  obtain  for  the  volume  Volx  in  the  state  space 
of  x: 

Volx^det{{A^y^)^Vols  (49) 

6.  Given  a  realization,  increasing  the  sampling  rate  (A““^)  will  result  in  a  larger  deadband. 

The  relationships  for  the  deadband  of  quantization  schemes  other  than  magnitude  trunca¬ 
tion  are  given  below: 

Magnitude  Rounding: 


I>f  =  {5[x]|  ||«[x]||oo<[/n<(iA-i)-l]./}  for  7n<(iA-i)  =  (50) 

X  2  2 
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and 

=  ||6[x]||co</n<(|A-')-/}  for  Jnf(iA-l)  9^  (51) 

For  the  deadband  in  terms  of  the  state  vector  we  have: 

=  {x  I  X  =  (A«)-i6[x], 6[x]  €  Af }  (52) 

where 

=  m  1  II  5[x]  ||oo<  M^A-^)  -  i]  -  /}  for  (53) 

and 

=  {^M  I  II  %]  lloo<  [Jn<(^ A-^)  +  ^]  •  0  for  Jnf(i A"^)  ^A"^  (54) 

Two  *3  Complement  Truncation: 

qyjwo  _  I  0  <  6[x]  <  1  •  [/nt(A-^)  -  1]  •  /}  for  /n<(A-^)  =  A"^  (55) 

and 

pTWO  _  I  0  <  5[x]  <  1  •  Jnf(A-^)  •  /}  for  Int(A-^)  ^  A'^  (56) 

For  the  deadband  in  terms  of  the  state  vector  we  have:  ' 

=  {x  I  X  =  (A«)-'6[x],  6[x]  €  Aj^°}  (57) 

where 

=  {<5[x]  I  0  <  5[x]  <  1  - /n<(A~^)  • /}  for  Jn<(A~^)  =  A“^  (58) 

and 

_4TW0  ^  1  0  <  «[x]  <  1  •  [Jn<( A-^  )  +  1]  •  0  for  IntiA-^  )  ^  A'^  (59) 

In  the  above  set  definitions,  all  inequalities  axe  to  be  interpreted  elementwise,  i.e.  x  <  y 
with  x,y  6  72"*  means  *,•  <  yj,  i  =  1, •  --.m.  Furthermore,  the  notation  0,1  stands  for  the 
zero  vector  and  the  vector  with  component  values  of  one,  respectively. 
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IV.  THE  M-D  CASE 


rV.l  Additional  Notation  for  the  m-D  Case 


The  m-D  Roesser  model  has  the  following  ^-operator  formulation  [7]: 


r  ^(i)[x(i)](n)  - 

1 - 

s 

? 

t  -  _ 

■ 

_g(n»)[x('^)](n). 

r  4  ^ 
^11 


Al„-\  rx»)(n) 


■^ml  *  ■  • 

x^^)(n)  ] 


rBi 


+ 


)(n) 


+  A 


x(”*)(n)J  I 
^(i)[x(^)](n)  1 


<5('")[x(’”)](n)J 


u(n); 


(60) 


(61) 


The  input-state  equations  in  (60)  and  (61)  describe  a  first  hyper-quadrant  causal  m-D 
system. with  a. uniform  sampling  period  of  A  in  all  directions.  The  operators  5^0  and  ^^0 
represent  the  shift-  and  delta-operator  in  the  direction  specified  by  the  axis  n^.  In  particular 


5(0[x(*)](n)  =  x(0(ni , . . . ,  n._i ,  n.-  -t- 1,  n.+i, . . . ,  n^)  (62) 

^?(0[x(')j(n)  =  l(x(0(„^^ .  . .  ^  ni_i,  n,-  -b  1,  n.+i, . . . ,  n^)  -  x(0(n)).  (63) 

Here,  (n)  =  (nj , . . . ,  rim)  denotes  a  point  in  the  first  hyper-quandrant,  x^O (n)  is  the  portion 
of  the  state  vector  propagating  in  the  direction  specified  by  the  axis  n,-,  u(n)  is  the  m-D 
input  vector,  and  Afj  and  Bf ,  for  i  =  1, . . . ,  m,  j  =  1, . . . ,  m,  are  the  submatrices  of  the 
system  and  input  matrices,  respectively. 


If  (60)  is  realized  in  fixed-point  arithmetic,  it  tahes  the  following  form  under  zero-input 
conditions: 


f 

r 

••• 

■  x^^)(n)  ■ 

• 

=  Q 

• 

'  •  .  • 

.(5('")[x^’^^](n) . 

1 

_x('")(n). 

(64) 


Equation  (64)  assumes  quantization  after  summation;  since  practically  all  mod¬ 
em  DSP  machines  implement  this  quantization  scheme,  we  only  consider  this  format.  The 
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vector- valued  quantization  nonlinearity  Q{'}  may  represent  any  one  of  the  conventional 
schemes,  viz.,  magmtude  truncation,  magmtude  rounding,  two’s  complement  truncation, 
and  two’s  complement  roxmding. 

Equation  (61)  can  be  implemented  in  two  different  forms: 


5(i)[x^0](n)  - 

■  x^^^(n)  • 

> 

.g("*)[x(”*)](n). 

— 

.x("‘)(n). 

+  Q  ^ 

A- 

► 

4 

or 


< 

■  x^^)(n)  • 

-  ^(i)[x(*)](n)  - 

] 

• 

=  Q< 

• 

+  A- 

• 

[  •  (66) 

.g('”)[x(”*)](n). 

_x("‘)(n). 

J 

Equation  (65)  corresponds  to  quantization  after  multiplication,  whereas  (66)  corresponds 
to  quantization  after  addition.  In  contrast  to  (60),  for  (61),  it  is  not  obvious  which  of  the 
two  forms  stated  above  is  preferable. 

The  following  definition  for  asjmxptotic  stabiHty  [8]  will  be  used  throughout  this  paper. 


Definition,  An  m-D  first  hyper- quadrant  causal  discrete-time  system  is  asymptotically 
stable  under  all  finitely  extended  bounded  input  signals  tz(n)  where 

l^(n)l<5,  for  ni+^--+nm<D]  (67) 

u(n)  =  0,  for  ni+-^^  +  nm>  D,  (68) 

if  all  the  states  of  the  m-D  discrete-time  system  asymptotically  reach  zero  for  rii  -f  • . .  -|- 

Tim  oo.  Here,  >  0,  i/  =  1, . . . ,  m,  5  is  a  nonnegative  real  number,  and  D  is  a  positive 
integer. 


Since  the  fixed-point  systems  considered  are  in  fact  finite  state  machines,  the  condition 

/  \ 

:  -.0, 

Vx('”)(n)/ 
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for  ni  -1  4-  rim  ~+  oo,  >  0,  :/  =  1, . . . ,  m,  can  be  strengthened  to 


for  all  points  ui  +  •  •  •  +  Um  >  c,  >  0,  z/  =  1, . . .  ,m,  where  c  is  some  finite  integer. 
rV’.2  Necessary  Conditions  for  Global  Asymptotic  Stability 

In  this  section,  we  present  some  necessary  conditions  for  stability  of  a  first  hyper¬ 
quadrant  causal  m-D  discrete-time  system  represented  in  its  Roesser  local  state-space 
model  m  (60,61).  These  necessary  conditions  are  formulated  in  terms  of  1-D  conditions. 
This  theorem  follows  directly  from  a  result  in  [6]  which  was  formulated  for  ^-operator 
implemented  discrete- time  systems.  The  proof  of  the  theorem  rests  on  the  fact  that  a  first 
hyper-quadrant  m-D  system  can  be  described  by  a  1-D  system  for  those  locations  that  are 
along  the  m  coordinate  axes  of  the  boundary  of  the  hyper-quadrant.  Reformulating  the 
result  in  [6]  for  5-operator  systems  produces  the  following  theorem: 

Theorem  5. 

(a)  A  necessary  condition  for  global  asymptotic  stability  of  the  system  in  (64,65)  is 
that  each  of  the  following  1-D  systems  in  (69,70)  is  globally  asymptoticaUy  stable: 

5(*>[x(‘)](n,)  =  Q  |[Af.]x(*)(ni)}  ;  (69) 

$(‘)[x(‘)](n,)  =  x^')(n,)  -h  Q  |a  .  5(‘)[x(‘>](n0}  ,  (70) 

where  z  =  1, . . . ,  m. 

(b)  A  necessary  condition  for  global  asymptotic  stability  of  the  system  in  (64,66)  is 
that  each  of  the  following  in  1-D  systems  in  (71,72)  is  globally  asymptotically  stable: 

5(*)[x«](n.)  =  Q|[Af..]x(‘)(n.)}; 

g^‘>[x(‘>](n.-)  =  Q  |x(‘>(n.)  -f  A  ■  5(‘)[x(*>](n0}  , 

m. 


where  z  =  1, . . . , 


(71) 

(72) 


Proof.  For  a  detailed  proof,  and  generalizations  to  higher  sub-dimensional  systems,  the 
reader  is  referred  to  [6]. 

Theorem  5  can  be  viewed  as  an  extension  of  the  concept  of  practical  BIBO  stability 
to  asymptotic  stability  of  nonlinear  systems.  It  is  particularly  useful  in  proving  instability 
in  m-D  nonlinear  systems. 

We  can  now  combine  Theorem  1  and  Theorem  5  to  formulate  a  necessary  condition 
for  stability  of  m-D  first  hyper- quadrant  causal  ^-operator  formulations  of  the  generalized 
Roesser  model. 

Corollary  6. 

(a)  A  necessary  condition  for  global  asymptotic  stability  of  the  m-D  systems  in  (64,65)  is 

A  >  0.5,  for  magnitude  rounding; 

A  >  1,  for  truncation. 

(b)  A  necessary  condition  for  global  asymptotic  stability  of  the  m— D  system  in  (64,66)  is 

A  >  0.5,  for  magnitude  roimding; 

A  >  1,  for  trimcation. 

Proof.  The  proof  follows  from  Theorems  1  and  5. 

Remarks: 

1.  Corollary  6  is  also  essentially  applicable  to  the  case  where  the  sampling  time  varies 
with  the  direction  of  propagation.  In  the  C2ise  of  the  system  description  (64,65),  the 
inequalities  in  Corollary  6  would  have  to  be  replaced  by 

Ai  >  0.5,  for  magnitude  rounding; 

Aj  >  1,  for  truncating, 

for  2  =  1, . . . ,  m.  The  conditions  for  the  system  (64,66)  are  analogous. 
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2.  Otir  analysis  is  limited  to  the  zero-input  case  for  which  DC  limit  cycles  along  the  axis 
were  used  to  derive  conditions  for  non-convergence.  If  one  includes  other  types  of  limit 
cycles  in  the  analysis  or  even  response  types,  which  axe  not  periodic  amd  are  known 
to  exist  only  in  the  m-D  case,  the  requirements  for  A  may  become  even  more  severe. 

3.  Corollary  6  shows  that  fixed-point  implementations  of  1-D  and  m-D  5-operator  sys¬ 
tems  cannot  be  realized  limit  cycle  free,  if  good  coefficient  sensitivity  and  quantization 
noise  measures  have  to  he  achieved. 

V.  CONCLUSION 

In  this  paper,  it  was  shown  that  fixed-point  implementations  of  1-D  and  m-D  8- 
operator  systems  aire  not  limit  cycle  free  even  if  the  tmderlying  linear  system  is  stable  and 
the  sampling  time  is  chosen  small.  This  non-convergent  behavior  can  be  explained  by  the 
quantization  of  the  5-term  to  zero  which  leaves  the  state  vector  unchanged.  The  smaller 
the  sampling  time,  the  more  severe  this  effect.  The  size  of  the  deadband  increases  with 
a  decreasing  sampling  time.  Therefore,  the  practical  value  of  5-operators  for  fixed-point 
implementations  of  1-D  and  m-D  systems  is  questionable.  There  are  however  indications 
that  this  effect  is  much  less  severe  in  floating-point  implementations. 

5-operator  implemented  discrete-time  systems  represent  a  class  of  systems  where  the 
quantization  noise  at  the  output  can  be  small  compared  to  other  realizations.  However, 
as  was  shown  above,  such  realizations  will  invariably  exhibit  limit  cycles,  which  are  highly 
correlated  quantization  noise.  Therefore,  in  this  case,  typical  measures  for  quantization 
noise  are  of  very  limited  use  for  obtaining  any  insight  into  the  likelihood  of  limit  cycles 
and  vice  versa. 
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I.  Introduction 


Current  interest  in  ^-systems  is  due  mainly  to  two  reasons:  (a)  6-systems  provide  superior  roundoff  noise  [1-2] 
and  coefficient  sensitivity  [3-4]  properties,  and  (b)  6-operator  makes  it  possible  to  treat  both  continuous-time 
(CT)  and  discrete-time  (DT)  systems  in  a  unified  manner  since  it  yields  the  differential  operator  as  a  limiting 
case  [5-6]. 

Hence,  implementation  of  two-dimensional  (2-D)  and  multi-dimensional  (m-D)  systems  using  the  6- 
operator  can  be  expected  to  provide  digital  filters  that  perform  better  in  a  shorter  wordlength  environment. 
If  this  is  the  case,  such  implementations  can  find  widespread  use  in  high  performance,  real-time  applica¬ 
tions,  where  fast  sampling  and/or  shorter  wordlength  are  desired.  In  such  cases,  traditional  ^-operator 
implementations  perform  poorly  [7]. 

With  this  in  mind,  research  directed  towards  developing  models  for  2-D  and  m-D  6-systems  is  warranted. 
This  paper  presents  a  local  state-space  (s.s.)  model  that  is  the  counterpart  to  the  well  known  ^-operator 
based  Roesser  model  [8].  We  also  define  the  notions  of  gramians  and  balanced  (BL)  realization,  and  address 
their  computation.  With  these  tools  in  hand,  we  then  investigate  coefficient  sensitivity  properties  of  this 
model.  Indeed,  implementation  of  2-D  and  m-D  systems  using  this  Roesser  8-model,  under  mild  conditions, 
is  shown  to  provide  superior  coefficient  sensitivity  compared  to  the  more  conventional  implementation  of 
Roesser  q-model.  As  usual,  for  notational  simplicity,  we  concentrate  only  on  the  2-D  case,  the  extension  to 
the  m-D  case  being  quite  straight-forward. 

The  paper  is  organized  as  follows:  Section  II  provide  the  nomenclature,  some  preliminary  material,  and 
a  brief  review  of  relevant  results.  Section  III  contains  the  development  of  the  Roesser  6-model  and  some 
important  system  theoretic  notions.  In  particular,  after  establishing  the  connection  between  the  gramians  of 
one-dimensional  (1-D)  q-  and  6-systems,  we  define  the  notion  of  gramians  for  2-D  6-systems.  Relationship  be¬ 
tween  these  and  those  corresponding  to  2-D  ^-systems,  gramians,  notion  of  a  BL  realization,  and  its  computa¬ 
tion  are  then  presented.  Investigation  of  coefficient  sensitivity  of  the  6-model  is  in  Section  IV.  Addressing  the 
more  general  multi-input  multi-output  (MIMO)  case,  for  this  purpose,  two  sensitivity  measures— applicable 
for  fixed-point  (FXP)  and  floating-point  (FLP)  arithmetic  schemes— are  proposed.  Conditions  under  which 
the  proposed  6-model  offers  superior  coefficient  sensitivity  are  also  derived.  Section  V  contains  an  example. 
Section  VI  is  reserved  for  concluding  remarks. 
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II.  Nomenclature  and  Preliminaries 


2.2.  Nomenclature 


3?,  9,  H 
^W]n,  9^Hn 

i'^v  )nv 

In 
{^ij  } 
A\A^ 
trace[A],  Xi[A] 

0,  0 
(n) 

e,- 

pQXP 

Uqxp 

II^IIf 


Reals,  complex  numbers,  nonnegative  integers. 

Set  of  matrices  of  size  g  x  p  over  5?  and  0^. 

Set  of  univariate  polynomials  of  degree  n  (with  respect  to  indeterminate  w  G 
5)  over  3?  and  5>. 

Set  of  rational  univariate  polynomials  of  degree  n  (with  respect  to  indetermi¬ 
nate  w  over  3?. 

Set  of  bivariate  polynomials  of  relative  degrees  rih  and  n*,  (with  respect  to  the 
indeterminates  ly/i  G  ^  and  Wy  G  0^,  respectively)  over  3i. 

Set  of  rational  bivariate  polynomials  of  relative  degrees  Uh  and  Uy  (with  re¬ 
spect  to  the  indeterminates  G  ^  and  Wy  G  respectively)  over  3f. 

Unit  matrix  of  size  n  x  n. 

Elements  of  matrix  A. 

Complex  conjugate  transpose  and  transpose  of  matrix  A. 

TYace  and  z-th  eigenvalue  of  matrix  A. 

Matrix  Kronecker  sum  and  product  operators. 

Unit  vector  in  with  1  on  the  z-th  row. 

ELi  ® 

Frobenius  norm  of  A. 


For  g-systems,  indeterminate  z  (with  or  without  a  subscript)  is  used;  for  <5-systems,  we  use  c  (with  or 
without  a  subscript).  In  the  1-D  case,  corresponding  q-  and  (^-systems  are  related  by 


S=: 


q-l 


z-  1 


A  ”  A 
where  A  is  a  positive  real  constant,  usually  the  sampling  time. 

For  2-D  systems,  subscripts  h  and  v  denote  horizontally  propagating  (h.p.)  and  vertically  propagating  (v.p.) 
subsystems  of  the  corresponding  Roesser  local  s.s.  models. 

Uh,  Uy,  n  Sizes  of  the  h.p.  and  v.p.  subsystems;  n  =  Uh  +  riy. 

A/i,  Aj;  Positive  real  constants  denoting  ‘sampling  times’  along  h.p.  and  v.p. 
directions. 

t  E  0  Ayln^  G  Ahln,q  0  Ayln^q  G 

Iz.  Ic  Zhin,  0  Zvin.  G  Chin,  ^  Cylr,^  G 


Corresponding  2-D  q-  and  ^-systems  are  related  by 


Sh  — 


Qh-^ 


Ch  = 


Zh-l 


X  _  ~  1  ^  _  z,,- I 

by  -  y  Cy  - 


A,  ”  A,  ’  A,  ”  "  A, 

We  use  subscripts  6  and  q  to  differentiate  between  corresponding  6-  and  ^-systems;  for  example,  s.s.  realiza¬ 
tion  of  a  given  DT  system  is  either  {As.Bs.Cs,  D$}  if  implemented  based  on  ^-operator  or  {Ag,  Bq.Cg,  Dg} 
if  implemented  based  on  (/-operator.  The  following  notation  is  also  used: 


II(,^h)  ^v)\c—*z  II(^^hi  j  )  ■2^1;  )Iz— ►c  —  I^i^Zh^  Zy^\  =  . 

— 1)/Av  + 
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Stability  studies  of  2-D  q-  and  (5-systems  involve  the  following  regions: 

Tq  {izh,Zv)  €  :  \zh\  <  1,  |z^|  <  1},  {{zh,Zv)  €  :  \zh\  <  1,  |2t,|  <  1}, 

{{zh,Zy)  G  9^  :  |z/,|  =  1,  |z„|  =  1}. 

{(cft.c,)  e  92  :  \c^  +  i/a^I  <  i/a^,,  |c,  +  1/AJ  <  1},  {(cft,c„)  G  9^  : 

|c;,  +  1/Aft|  <  1/A/j,  |ct,  +  1/A„|  <  1},  {(cftjCt,)  G  9^  :  \ch  +  1/Aft|  = 

1/Aft,  |c, +  1/A„1  =  1}. 

A  g-system  polynomial  with  all  its  roots  in  Uq  (for  the  1-D  case)  or  (for  the  2-D  case)  is  said  to  be  stable. 
The  corresponding  regions  for  a  5-system  polynomial  are  lU  (for  the  1-D  case)  and  f//  (for  the  2-D  case), 
respectively. 


2.2.  Preliminaries 

First,  we  provide  a  brief  introduction  to  the  Roesser  local  s.s.  model  applicable  to  2-D  g-operator  based  DT 
systems  [8], 

Definition  2.1,  The  following  partial  ordering  in  is  used: 

^  and  k<j\ 

{h^k)~{i^j)  <=>  h  —  i  and  k  —  j\ 

{h,k)<{i,j)  -i=>  (h,k)<{ij)  and  ih,k)^{i,j). 


The  2-D  dynamical  system  under  consideration  is  assumed  to  be  linear,  shift-invariant,  and  strictly 
causal.  Moreover,  it  is  taken  to  be  modeled  by  a  set  of  first-order  vector  difference  equations  over  Given 
such  a  p-input  and  g-output  2-D  system,  its  nfth-nfo'  Roesser  local  s.s.  model  is  of  the  form  [8] 


9A[x'*](i,i)' 

‘x''(z,/)' 

jf„[x''](z,j)_ 

La(^)  A^^'d 

.x''(i,/) 

0 

CN 

cq 

Kbj)  =  [Aq] 

.x”(i.j)_ 

y(b  /) 


c; 


{2)i 


+  [D,]u(z',  j)  A  [Cq] 


x'*(*,i) 


+  [-S5]u(z,i); 


+  [Dq]u{i,i), 


(2.1) 


where  u  6  3?^,  G  G  3?^^, 

G  G  G 

operators  gft[-]  and  9„[-]  denote 


and  y  G  3f9  Also,  G  A^^^  G  G 

(^(1)  g  (^.(2)  ^  ^  sjj,xp_  ^  ^2 


9/.[x](f,i)  =  x(z-|- l,j)  and  ?„[x](bi)  =  x(z,i -|- 1).  (2.2) 

The  s.s.  model  in  (2.1)  is  typically  denoted  by  the  quadruple  {A,,  C^,  D^}.  The  corresponding  2-D 

characteristic  equation  and  the  2-D  transfer  function  it  realizes  are 

det[/^  —  Aq]  ~  det[(2:/i/n;i  0  —  Aq\  G  3^[2:/t]r^;,  ; 

Hq{zn,z,)  =  Cq{h  -  Aq)-^Bq  +  D,  G  3^(2ft)„,(z,)„„, 

where  ^ft,2„  G  9.  In  the  literature,  x'‘  and  x"  are  referred  to  as  the  horizontally  propagating  (h.p.)  and 
vertically  propagating  (v.p.)  local  state  vectors  of  {A,,  B,,  C,,  D,}. 
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Assuming  no  nonessential  singularities  of  the  second  kind  on  ,  for  BIBO  stability  of  the  s.s.  model 
above,  it  is  necessary  and  sufficient  that  (see  [9],  and  references  therein) 

det[h  -  A,]  7^  0,  V(zA,z„)  G  wj.  (2.4) 

For  investigating  coefficient  sensitivity  properties,  we  will  use  certain  relationships  encountered  in  Kro- 
necker  products  and  matrix  differentiation.  The  following  are  from  [10]. 

The  derivative  of  A  =  {dij}  E  with  respect  to  6  G  is 


Hence 

lift  =  EE  (^■^) 

The  derivative  of  A  =  {dij]  G  3?^^^  with  respect  to  B  =  [bki]  G  is  the  partitioned  matrix  whose 

{k^£)-th  partition  is  dAfdbki^  that  is, 

-  dA  dA  ” 

dbii  dbir 

:  •.  :  G3^^^^^^  (2.7) 


Hence 


2  q  p  s  r 

=  EEEE 

^  i  =  l  j  =  lk=li=l 


’=eeCi&- 

fc=i  t=i 
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III.  State-Space  Model  for  ^-Operator  Implementation 


3.1.  Local  s.s.  model 

To  exploit  the  superior  finite  wordlength  properties  of  (^-operator  implementations,  analogous  to  the  1-D 
case,  let  us  define  the  operators  6hl]  and  6y[-]  as  follows: 


'  Ah  ' 

(5t;[x](f  j)  =  ~  i)  _  j)  —  x{i^j) 

Ay  Ay 

where  Ah  and  are  two  positive  real  numbers.  Hence,  the  following  relationships  are  applicable: 

-  V  ^  ~  1  “h  AhSh]  Sy  —  —  -S - =  1  +  AySy. 

l\y 


(3.1) 


(3.2) 


Remark.  When  Ah  and  A^;  are  the  ‘sampling  times’  corresponding  to  the  horizontal  and  vertical  spatial 
directions,  the  operators  8h  and  by  in  fact  provide  the  first-order  forward  Euler  approximants  of  the  corre¬ 
sponding  derivatives.  When  A/,  0  and  A^  ^  0,  the  operators  bh  and  by  yield  these  derivatives.  In  the 

1-D  case,  this  is  the  reason  for  the  possibility  of  a  unified  treatment  of  both  CT  and  DT  systems  [5]. 

With  (3.2)  in  mind,  we  get 


^ft[x'*](*,;)’ 

- 

/„[x”](4j)_ 

__ 

{^h  l)^n/, 

0 


0 

(^11  l)-^ni, 

—  -f- 


0 


0 


x'*(i,i) 


Here, 


^  =  ©A,4j€0i" 

Using  (3.3)  in  (2.1),  it  is  easy  to  get  the  following: 


’4[x'“](4j)]  _  [4^^  4^^' 

fx'Ut,  ?■)■ 

/  •  T  A  1 

x'‘(4i) 

_^4x''](4j)J  [^(3) 

1 - 

< 

u{i,j)  =  [As] 

y{hj)  = 


c 


'(2)1 


x'‘(4i) 

x''(4i)J 


+  [Oshiij)  =  [Ci] 


x'‘(4i) 


[5<]u(i,i); 
+  [-Di]u(i,j). 


(3.3) 


(3.4) 


(3.5) 


In  addition,  as  opposed  to  its  corresponding  ^-operator  implementation,  in  a  i5-operator  implementation,  one 
must  perform  the  following  computations: 


x''(i  +  1, i)  =:  x'*(i,  j)  +  A,,  ■  5ft[x''](4i);  x''(4i  +  1)  =  x*'(4  j)  +  A„  ■  ,5„[x"](i,i).  (3,6) 

Here, 

^6  “  ^  ~  ^n)  ^ ^  —  Ri,  +  ^-^6  \  ^6  ~  ^  ^  Bq  —  ^ ] 

Ci=C,  C,  =  Cs;  Ds=D,  D,  =  Ds. 

The  size  of  each  submatrix  in  (3.5)  is  equal  to  the  corresponding  submatrix  in  (2.1).  In  the  sequel,  the  s.s.  re¬ 
alization  {Aqy  Bq,Cqj  Dg}  in  (2.1)  will  be  referred  to  as  the  q-model,  while  the  s.s.  realization  {Af,,  Bs.Cs,  Ds} 
in  (3.5)  will  be  referred  to  as  the  b~modei 
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3.2.  Properties  of  the  <5-model 

The  general  response  equation  of  the  ^-model  may  be  derived  in  a  manner  that  is  exactly  analogous  to  that 
in  [8].  Hence,  in  what  follows,  only  the  salient  results  are  given,  detailed  derivations  being  omitted  for  the 
sake  of  brevity. 


The  general  response  of  6-model  is  given  by 


=  E^: 


ij-k 

6 


k  =  0 


x^{0,k) 

0 


+E4-" 


y(i,i)  =  (cl‘'  cP>l 


L 

(0,0)<(A.t)<(i,i) 


h  =  0 


0 

Lx^/i-o) 


B 


(1) 


(2) 


u(/i,^*);  (3.8) 


Here,  refers  to  the  transition  matrix  of  (5-model.  With  the  partial  ordering  in  agreed  upon  previously 
(Definition  2.1),  it  may  be  recursively  computed  as  follows: 


>1^"  = 


0  0 
0  0 


In,  0 

0  /n„ 


'o  o' 

•  • 

1 - 1 

ro 

0  ■ 

0 

0  1 

o 

In.. 

*^3 

- 1 

for  (ij)  <  (0,0); 
for  (iJ)  =  (0,0); 
for  (i,i)  =  (1,0); 
for  (i,j)  =  (0, 1); 
elsewhere. 


(3.9) 


Remarks. 


1.  +  =  In 


If  -r As  —  ^  -f  — /„). 

2.  A'-"  =  (Aj'°)',  Vi  >  1,  and  A°/  -  (A°/y,  Vj  >  1. 


The  2-D  ^-model’s  characteristic  equation  and  transfer  function,  and  their  relationships  to  those  of  the 
corresponding  ^-model,  are  as  follows: 

1 


det[/c  -'4^]  —  det[(c/j,/,^^  0  c^jlnj)  -^6^ 


det[^] 


det[/2  +c  ^  ^[^/i]n/4  [^vjriv 


where 


i  ^v)  —  ^si^c  -^<5)  ^6  “b  ^6  —  ^ qi^h  )  ^v)\z—*c  ^  ^(^/i )n/i  )nv 

1  .«  .  A  1 


Ch 


Zh  =  I A/jC/i; 


Cu  — 


1  0  . 


A,  "  "  .  . 

As  for  the  ^-model,  it  is  easy  to  show  that,  2-D  equivalent  transformations  of  the  type 


■^'‘(i.i)' 

■  rCi) 

0 

x'*(i,i) 

.x''(i,i). 

0 

j.(4) 

.x"(i,  j). 

“  J 

.x''(i,  j)_ 

(3.10) 


(3.11) 


(3.12) 


where  and  e  are  nonsingular,  yield  an  equivalent  2-D  s.s.  realization 

{As,Bs,Cs,Ds},  where 


As  =  TAsT  \  Bs  =  TBs,  Cs  =  CsT  *,  and  Ds  —  Ds- 


(3.13) 
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The  transfer  function  of  {As,  Ds}  is  the  same  as  that  for  {As,  Bs,Cs,  Ds). 

We  will  also  assume  that 

det[/,  /  0,  V(ch,c,)el7l  (3.14) 

Due  to  (2.4)  and  (3.14),  assuming  no  nonessential  singularities  of  the  second  kind  on  T^,  this  implies  BIBO 
stability  of  the  2-D  (5-model  (see  [11],  and  references  therein), 

3.3.  Gramians 

In  the  2-D  g-operator  case,  reachability  and  observability  gramians  are  typically  taken  to  be  natural  exten¬ 
sions  of  the  integral  expressions  of  their  1-D  counterparts  (see  [12-13],  and  references  therein).  To  adopt  a 
similar  approach  for  the  6-operator  case,  we  first  investigate  1-D  gramians  for  the  6-operator  case  as  defined 

in  [5], 

1-D  case.  We  quote  relevant  definitions  from  [5],  p.  194  and  200: 

Definition  3.1.  [5],  Consider  the  1-D  stable  6-system  {As ,  Bs,Cs,  Ds} ^  The  reachability  gramian  Ps  and 
observability  gramian  Qs  are  the  unique  solutions  of  the  following  Lyapunov  equations: 

AsPs  +  PsA^  -{-  A  •  AsPsA^  =  —BsB^ ;  AsQs  +  Qs^s  +  A  ■  AsQsAs  =  —Cs  Cs • 


We  now  provide  the  integral  representations  of  Ps  and  Qs : 

Lemma  3.1.  Consider  the  1-D  stable  6-system  {As,  Bs,Cs,  Ds}  with  gramians  Ps  and  Qs.  Let 
{Ag,  Bg,Cg,  Dg}  witL  graiiiians  Pg  and  Qg  be  the  corresponding  1-D  stable  g-system.  Then 


Moreover, 


Fi(c)F;{c) 


dc 

1  + Ac’ 


Qi 


GKc)Gsic) 


dc 

1  +  Ac' 


Qs  =  AQg 


Q, 


-Qe- 


Proof.  Note  that,  Ag  =  AAs,  Bg  ~  ABs,  Cg  =  Cs,  and  Dg  ~  Ds  [5],  Substitute  these  in  the  Lyapunov 
equation  for  Ps  in  Definition  3.1  to  get 


A;PsA;-Ps:==~~BgBl 


Noting  that  Pg  is  the  unique  solution  of  A*gPgAg  ~  Pg  -  -BgBg,  we  have  Ps  ~  Pg/A.  Moreover,  the  integral 
expression  for  Pg  is 


dz 

z 


where  Fq{z)  =  (zin  —  Ag)  ^Bg  [13],  The  claim  regarding  Ps  now  follows.  The  rest  follows  in  a  similar 
manner,  ■ 


2-D  case.  With  Lemma  3.1  in  mind,  we  now  present  the  following 
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Definition  3.2.  Consider  the  2-D  stable  6-system  The  reachability  gramian  and 

observability  gramian  Qe  are  defined  as 

r  p(l)  p(2)l  1  r  rlr 


py,  py.  ^ 

[p(3)  p(4)_ 


[p(3)  pWj  (2 

-Qf  (2 


^  (27ri)2  ^ 


dcfi  dcy 

T  1  “f* 

dcfi  dcy 

+  A/iC/i  1  +  A^Cv 


FsiCh^Cy)  A  ==  ;  G  3i^^^(c/.)n.(c,)n.; 


Gs(ch,Cy)  =  Csih-  Ai)  ^  =  [g«i  gij 


Remarks. 

1.  Note  that,  c^;)  G  ,  Vz  =  1, . . . ,  n,  and  g<5y(c;i,Cv)  G  Vj  =  1, . . . ,  n. 

2.  To  eventually  compare  the  performance  of  the  6-model  and  its  corresponding  ^-model,  the  following 
relationships  will  be  useful: 

Fs  (c/l  ,  Cv)  |c— fZ  —  Fqi^Zji  ,  Zy  )  ^  V  (c/j  ,  Cy  )  |c--*-Z  —  J  “  1  J  *  *  •  }  ^> 

M  -y  ^  .  c  > — ^  W  _  (  A^hgqi,  (or  i  =  ly . . .  yUh] 


Gff  {cfi  y  Ct;)  |c— fZ  Gqi^Zji  y  Zy^  •  ^  g<5,  (^/l  )  )  i 


Avg^,,  for  2  =:  n/i  + 


3.  Definition  3.2  is  completely  analogous  to  the  1-D  and  2-D  ^-operator  cases.  In  the  latter  case,  these 
gramians  have  been  extremely  useful  in,  and  hence,  have  been  extensively  used  for,  investigating  co¬ 
efficient  sensitivity,  roundoff  noise  propagation,  model  reduction,  etc.  For  instance,  see  [12-16],  and 
references  therein. 

Lemma  3.2.  Consider  the  2-D  stable  6-system  {A^,  BsyCsy  Ds}  with  gramians  Ps  and  Qs-  Let 
{Aqy  BqyCqy  Dq}  with  gramians  Pq  and  Qq  be  the  corresponding  2-D  stable  ^-system.  Then 

~  A  A  ~  A^AyP^;  Qs  -  ^Qq^  Qq  =  AhAy^^^Qs^^^ . 

Ah^V  AfiAy 

Proof.  Consider  the  integral  expression  for  Ps  in  Definition  3.2.  With  the  variable  change  c  — *■  z  and  (3.15), 
we  get 

r,  1  1  1  I  r.  , 


However  [12], 


dzh  dzy 


Hence,  the  claim  regarding  Ps  follows.  The  proof  regarding  Qs  is  similar. 
Corollary  3.3.  The  block  matrices  of  the  gramians  are  related  as  follows: 


.(1)  p(2) 


1  r  fP 


p(3)  p(4)|  lp(3)  p(4) 


qT 

qT 


q  ^  q 

ftoS"  el"’ 
Q?>  tel” 


p(D  p^^^l 

^3^  =  A.A„ 


p(l)  p(2)- 

p(3)  p(4)Jl 


Fi^^  Fi^^ 
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Proof.  This  follows  directly  from  Lemma  3.2. 


With  the  above  results  in  mind,  we  now  make  some  pertinent  statements  that  are  in  complete  analogy 
with  the  2-D  g-operator  case.  These  may  be  easily  verified/justified  from  the  corresponding  results  for  the 
latter  (see  [12],  and  references  therein). 

Lemma  3.4.  The  gramians  may  be  represented  as  follows: 


where 


1  =  0  j  =0 


■M; 


Qs  = 


oo 


0  j  =  o 


c:csA\ 


Ms,,  = 


0, 


<5 


for  {i,j)  -  (0,0); 
for  (z,i)  >  (0,0). 


Lemma  3.5.  Consider  the  2-D  stable  6-model  {As,Bs,Cs,Ds}  with  gramians  Ps  and  Q(.  Let 
{As,  Bs,Cs,D(}  with  gramians  Ps  and  Qs  be  an  equivalent  system  obtained  with  a  nonsingular  trans¬ 
formation  of  the  type  in  (3.12-13).  Then,  A  =  TPsT*  and  Qs  =  T~^'QsT~'^.  Moreover,  eigenvalues  of  PQ 
are  invariant  under  such  a  transformation. 

Definition  3.3.  The  2-D  (5-model  {As^  B^,Csj  D^}  is  said  to  be  balanced  (BL)  if  its  gramians  and 
satisfy 


.(1) 


=  diaglcr^^-’.crl 


(1)  Ji) 


yA(4) 


^  diagjcr^^V^ 


We  refer  to  i  ~  and  j  —  as  the  Hankei  singular  values  of  h.p.  and  v.p. 

subsystems,  repectively. 


If  the  principal  block  diagonal  matrices  of  Ps  and  Qs  are  each  positive  definite,  a  corresponding  BL 
realization  may  be  obtained  through  a  certain  simultaneous  diagonalization  procedure  referred  to  as  Laub’s 
algorithm  [17].  Regarding  this,  we  have 

Lemma  3.6.  Local  reachability  and  observability  of  6-model  {As,Bs,Cs,  Ds}  and  its  corresponding  g-model 
{Aq,  Bq^Cq^  Dg}  are  equivalent.  Moreover,  when  {As^  Bs^Cs ,  Ds}  is  locally  reachable  and  observable,  Ps^\ 
Ps'^\  Q^s^\  and  are  each  positive  definite. 

Separable  systems.  A  separable  (in  denominator)  2-D  ^-system  has  the  property  that  a[^^  =  0  (or,  equiv¬ 
alently,  =  0).  For  such  a  system,  off-diagonal  submatrices  of  Pq  and  Qq  are  all  zero  [12],  Moreover, 
the  diagonal  submatrices  may  be  conveniently  computed  through  the  solution  of  two  pairs  of  Lyapunov 
equations. 


From  (3.7),  it  is  clear  that,  a  separable  2-D  g-system  gives  rise  to  a  separable  2-D  6-system.  Regarding 
the  corresponding  gramians,  we  may  state  the  following 

Theorem  3.7.  Consider  the  separable  2-D  6-system  {As,  Bs,Cs,  Ds)  with  gramians  Ps  and  Qs.  Then, 
p(  )  _  q(2)  _  Q  p(3)  _  q(3)  _  Q  Moreover,  the  diagonal  block  matrices  of  Ps  and  Qs  may  be 
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computed  through  solution  of  the  following  two  pairs  of  Lyapunov  equations: 


+  +  AnAY>py>A^ 


(1)  p(i)  /i(i)* 


1 


A„ 


4"'«r + =  -  ^  1  C<‘>  kS'>4’’  1  ■  I  c!‘>  4‘'4'‘> : 


44)  (4)^  (4)^(4).  (.)^(.)- 


A/i 


5(2)  43)^a)][5(2)  ^(3)51 


(1)1 


+  QfAf  +  )'cf\ 


where  - 


=  and  -  A,.A„pf  \ 


Proof,  Results  regarding  the  off-diagonal  submatrices  are  obvious  from  Corollary  3.3.  Regarding  the  diagonal 
submatrices,  claim  may  be  shown  using  Theorem  3.2.2  of  [12].  For  instance,  consider  the  Lypaunov  equation 

^(i)'Q(i)A(o  -  ga)  =  -  A(^rQWAf), 

Using  (3.7)  and  Corollary  3.3,  second  Lyapunov  equation  in  the  claim  results.  Rest  follows  in  a  similar 
manner.  ■ 


^  Computation  of  BL  Realizations 

Computation  of  gramians  and  obtaining  BL  realizations  for  g-systems  have  been  investigated  quite  thor¬ 
oughly.  In  the  1-D  and  2-D  separable  cases,  one  may  solve  Lyapunov  equations  and  use  Laub’s  algorithm 
[17].  In  the  2-D  non-separable  case,  this  computation  is  not  that  easy;  however,  several  techniques  have  been 
developed  [12],  [18]. 

In  this  section,  we  provide  the  relationship  between  BL  realizations  of  corresponding  S-  and  ^-models. 
This  allows  all  available  techniques  for  gramian  computation  of  g-systems  to  be  utilized  for  6-systems  as 
well.  We  believe  this  to  be  an  important  contribution,  and,  to  the  authors’  knowledge,  such  a  relationship 
is  not  available  even  for  the  1-D  case.  The  development  below  concentrates  on  the  2-D  case;  the  1-D  case  is 
even  simpler.  For  convenience,  we  use  the  following  notation: 

{A,B,C,D}^{A,B,C,D}:  This  denotes  A  =  TAT-\  B  =  TB,  C  =  CT-\  and  D  =  D,  where  T  is  of 
type  (3.12-13). 

{Aq^  Bq,Cq,Dq}^-^{A5yB^,Cd,  Ds}\  This  denotes  the  corresponding  6-system  obtained  by  applying  (3.7). 
{A^^  BsyCd^D^}  ^-^^{Aq^  Bq^Cq,  Dq]:  This  denotes  the  corresponding  g-system  obtained  by  applying  (3.7). 


Moreover,  we  use  the  following: 

{Aq^  Bq^Cq^  Dq] 
{AqB  ,  BqB  ,  CqB  ,  DqB  } 

{Asb  ,  Bsb  ^  CsB  ,  B>6B  } 

{AsB2q^  ^6B2qiCsB2qi  D6B2q} 
{AqB26  5  BqB26  j  CqB26  ,  DqB2b  } 


Given  2-D  g-system. 

BL  realization  of  {Aq,  Bq,Cq,Dq)  obtained  by  applying  Tq,  that  is, 
{Aq  ,  Bq  ,  Cq ,  Dq  }  ^{AqB  )  BqB  j  CqB  >  C)qB  }  • 

2-D  6-system  obtained  by  applying  (3.7)  to  {Aq^Bq.Cq^Dq],  that 
is, 

BL  realization  of  {yi^,  }  obtained  by  applying  Ts,  that  is, 

2-D  g-system  obtained  by  applying  (3.7)  to  ,  C^£i , 

that  is,  {Af^B^  ^dByC^B  ^  D^b}  - ^bB2q,  C^B2qi  C^B2q}^ 

2-D  6-system  obtained  by  applying  (3.7)  to  {Aqs ,  Bqs  ,CqB,  BqB} , 

that  is,  {AqB  J  BqB  y  CqB  ^  DqB}  - - ^{AqB2S  y  BqB26  y  CqB26  y  DqB26}  ^ 
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Lemma  3.8.  The  following  relationships  are  true: 

T*5  Qn 

{Aqy  Bq^Cq^  Dq^  - ^{A^B2q,  BsB2q^  C5B2q^  {As  ,  ^  C  ^  ^  D  j  BqB26  ,  CqB26  ,  BqB26}  • 


Proof.  First, 


A6B2q  -  In+iAsB  =  4  +  ^TsAsP-^  -  In  +  ^TsC^Aq  ~  /n)T“'  =  TsAqTf^ 
since  =  T^.  The  remainder  may  proven  in  a  similar  manner. 

Lemma  3,9.  The  following  relationship  are  true: 


{AsB2q,  P6B2q,CsB2q,  DsB2q}  - ^  {AqB  ,  BqB  ,  CqB  j  Dq  b}  ] 

^1/2 

{AqB26,  BqB26,  CqB26,DqB2b}  - '^{Af,B^  B^b.C^B,  Bbs]- 


Proof  Note  that,  {Asb ^  Bsb ^C^b ,  B^b}  ^^as  following  grainians: 

BdB 


■  ^(1) 
^b 

p(2)  1 

^bB 

Drr-.— 

‘  y^(l) 
^b 

^SB 

ra(3) 

bB 

w 

:  ^bB  — 

^^bB 

V('i) 
^b  J 

Hence,  from  Corollary  3.3,  gramians  of  {AsB2q ,  BsB2q:  C^B2q ,  DsB2q]  are  as  follows: 

nC) 


BsB2q  — 


f  p(^)  1 

'^6  ^6B 

d(^) 

6B  ^6  J 


;  QbB2q  ~ 


A/,  ^<5  QbB 

^^bB  A, -^(5  J 


To  get  {AqB,  BqB.CqB^Dqs},  we  need  to  simultaneously  diagonalize  the  two  pairs  { A/^ 

and  {A/jAi,E^'^\  (A/i/Ai,)E^"^^}.  By  applying  Laub’s  algorithm,  we  get  these  two  transfonricvtions  to  be 
1/2  1/2 

K  '  In,  and  A„  '  This  proves  the  first  part.  The  remainder  follows  in  a  similar  manner.  ■ 

Corollary  3.10.  The  relationship  between  B,b,  C,b,  Z),^}  and  {^^b,  C^b,  D^b}  is  as  follows: 
A6B=i-^'\A,B-In)r^'''-,  BiB^r^l^B,B\  C^B  =  C,B^-'/^  Dsb  =  D,B- 

Proof  Note  that,  from  Lemma  3.9, 

Asb  =  r'iAsB2,  ~  In)  =  -  In)  =  r^'HA,B  -  In)r^^-- 

The  rest  follows  in  a  similar  manner.  ■ 

The  above  are  summarized  beIow\  Note  that,  the  missing  dinks’  may  also  be  easily  obtained. 


{Aq,  Bq,Cq,  Dq} 

{As,Bs:Cs,  Ds] 


{AqB,  BqB,CqB,  BqB}  -  {^<552?,  BsB2q,CbB2q  ,  BsB2q} 

jCorollary  3.10 


{Asb  j  Bsb  ,Csb,  DbB  } 


(rl/2 


BqB2b,CqB2b,  DqB2b} 
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IV.  Coefficient  Sensitivity 


Coefficient  sensitivity  is  an  important  criterion  on  which  one  s.s.  realization  may  be  preferred  over  another. 
By  generalizing  a  certain  sensitivity  measure  in  [19],  Lutz  and  Hakimi  [20]  have  addressed  sensitivity  mini¬ 
mization  of  MIMO  1-D  CT  systems.  The  2-D  g-operator  case  appears  in  [15],  and  references  therein.  This 
work,  applicable  only  to  the  SISO  case,  reveals  that  realizations  possessing  minimum  coefficient  sensitiv¬ 
ity  are  equivalent  to  BL  (modulo  a  block  orthogonal  similarity  transformation)  realizations  (see  [12],  and 
references  therein). 

In  what  follows,  we  study  coefficient  sensitivity  properties  of  the  2-D  (5-model  introduced  in  Section  III. 
Both  FXP  and  FLP  arithmetic  implementations  are  addressed.  We  follow  a  more  direct  approach  via 
Kronecker  product  formulation  and,  as  a  result,  this  work  is  applicable  to  the  more  general  MIMO  case. 

In  practice,  effects  of  coefficient  sensitivity  appear  in  the  system  frequency  response.  Hence,  it  is 
appropriate  to  study  the  quantities  dHe/dAs,  dHsjdBs,  dHsfdCs  and  dHsjdD^,  Using  relationships 
regarding  matrix  Kronecker  products  taken  from  the  excellent  treatise  of  Brewer  [10],  we  first  develop 
certain  relationships  regarding  these  quantities.  For  the  readers’  convenience,  relationships  used  from  [10] 
are  identified  by  the  same  equation  numbers  (these  begin  with  the  letter  T). 

First, 


Hence 


Second, 


Hence 


Third, 


Hence 


d  (9 

^v)  —  J  <^v)  —  ~  ^<5)  ^5(5  +  D^] 

—  [In  ^  Cs\[In  O  (Ic  '-As)  ^]  ’  [/c  —  As\  •  [In  O  (Ic  ~  ^<5)  ^][In  O  Bs] 

from  (T4.3)  and  (T5,5) 

—  [In  O  ^6{Ic  ~  As)  ]  *  U nxn  *  [In  O  {Ic  “  ^<5)  ^-^(j] 

from  (T2.4)  and  (T5.1). 


Ss^^iCh.C,)  =  [In^Gs]  ■  Unxn  *  [4  O  Fs]  G 


F6B^F{ch:Cy)  —  Hs{ch,c^)  —  ^^[^<5(4  ~  As)  ^ Bs  +  Ds]  — 

d  B 

=  [In  ®  Ge]  ■  -7^  from  (T4.3). 

Ons 

=  (/„  (8)  Gi]  ■  G 

c„)  —  —  Ai)  +  Di]  —  [CiFi\ 

dCsd  , 

=  ■  [In  ®  Fi]  from  (T4.3). 

=  U,^n  ■  [In®Fi]  G 


(4.1) 


(4.2) 


(4.3) 
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Fourth, 


Sso,ich,c.)  =  ■^H,{ch,c,)  =  -^[Ceih  -  +  D,] 

^  dDi 
~  dDY 

Hence 

5«^,(c/„c„)  =  l/5xp€r’>^p'.  (4.4) 

Lemma  4.1  The  quantities  5<5^^ (c/^ ,  Ci,),  5’^^^  (c/i, c^),  (c/i ,  Ct;),  and  56o^(c/i,Cy)  of  the  (5-modei  are 


(^/i  — 

S^2 

K 

••• 

]  )  j  C^,)  — 

slf 

si? 

■■■  g^’’ 

■  -  g? 

i 

-g<5n- 

rf(ir 

f(2)- 

f(i)' 

^2 

f(2)- 

5n 

p(2)* 

J  (^/l  j  ) 

-si? 

■^1,1 

^2,1 

(2) 

sl, 

Ei,2 

^2,2 

g“. 

’  '  ‘  ^l,P  ' 

■  ■  •  E2,p 

f(^)* 

aY 

AqV 

A/; 

- 

Eq,2 

■  * '  - 

^<5n 


Here,  denotes  a.  qxp  null  matrix  except  its  j-th  row  which  is  denotes 'd  q  x  p  null  matrix  except 

its  j-th  column  which  is  g^.,  and  Eij  are  n  x  p  elementary  matrices. 

Proof.  Relationship  for  follows  immediately  from  (4.4).  To  show  the  remainder,  note  that 


\F, 

0 

0  - 

'G^ 

0 

0  - 

[In  ®  = 

0 

F,  • 

0 

r-  OY’P'^^P' 
t  -O  , 

[In  ®  Ge]  = 

0 

Gs 

0 

£  ^nqxn'^ 

.  0 

0  • 

••  F,. 

_  0 

0 

Here,  [7„  (S)  Fs]  and  [In  each  has  nx  n  blocks.  Claim  now  follows  through  simple  yet  tedious  algebraic 

manipulations.  * 

Corollary  4.2.  The  quantities  Ss^^{ch,Cy),  SdB^{ch,Cv),  Ssc^{ch,Cy),  and  5^^^^  (c/^,  cy)  of  J-model  and  the 
quantities  Sg^^{zh,Zy),  Sq^^{zk,Zy),  Sga^{zh,Zy),  and  Sq^^{zk,Zy)  of  the  corresponding  g-model  are  related 
through  the  following: 

)  |c— ~  Ss  Q^{Cfi ,  Cy  )  jc^z  —  )  ^v)  -i 

)  ^v)\c^z  ”  Fg(^^{zh^  Zy),  ‘5’5£,^(c/i,  Cy)\c-^z  —  ^qOq  ^v)i 

where  E  =  [Ahln^^q  0  Ayln^g]^ 

Proof  This  is  immediate  when  (3.15)  is  applied  to  Lemma  4.1.  ■ 


To  proceed  further,  we  utilize  the  following 

Definition  4.1.  Let  H^{ch^Cy)  be  a  bivariate  matrix-valued  function  that  is  analytic  on  Then, 


|j7/(5(c/i ,  )|lp  — 


(2 


j  £j\Hs{cu,cY[ 


dzf,  dzy 

Zh  Zy 
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Remark.  This  matrix  norm  is  extensively  utilized  in  work  related  to  coefficient  sensitivity  (see  [15],  and 
references  therein)  mainly  because  it  leads  to  tractable  results.  This,  and  our  desire  to  make  a  comparison 
with  the  corresponding  ^-model,  are  the  primary  reasons  for  its  use  here. 


FXP  Arithmetic  Case 

For  FXP  arithmetic  implementations,  it  is  appropriate  to  define  an  absolute  sensitivity  measure  as 

^«PXP  =  1155.  Jl?  +  ^1155,,  Hi  +  Hi,  (4.5) 

where,  as  defined  previously, 

Si  A,  =  =  dHildai,.]  =  dHs/dbs,.; 

Sics  =  =  dHs/dcs.j]  =  {s5D„.,y}  =  dHi/dds-j. 


Remarks. 

1.  takes  into  account  variations  in  frequency  response  with  respect  to  perturbations  in  Bs, 
and  Ds^  Note  that,  in  FXP,  possible  perturbation  of  a  particular  coefficient  is  approximately 

independent  of  its  nominal  value.  This  justifies  the  definition  in  (4.5). 

2.  Use  of  different  norms  is  for  mathematical  feasibility  and  tractability,  and  is  typical  in  coefficient  sensi¬ 
tivity  studies  [15],  [3].  Given  a  ^-model  D^},  the  objective  is  to  characterize  those  realiza¬ 

tions  belonging  to  the  class  {A^  ^  Bs^Cs^  Ds}  =  {TAsT~^ ,TBs,CsT~^ ,  Ds}y  where  T  is  of  the  type  in 
(3.12-13),  that  minimize 

3.  Weights  associated  with  each  term  in  (4.5)  may  be  thought  of  as  averaging  factors.  The  ensuing  measure 
then  may  be  thought  of  as  an  average  sensitivity  per  input/output. 

4.  In  a  ^-operator  implementation,  due  to  the  necessity  of  performing  the  computation  in  (3.6),  coefficient 

sensitivity  will  be  affected  by  perturbations  of  Ah  and  as  well.  Hence,  must  be  modified 

to  contain  terms  of  the  nature  ||5^^^||2  However,  selection  of  Ah  and  is  somewhat 

arbitrary,  and  they  may  be  chosen  to  possess  exact  binary  FXP  representations.  If  so,  corresponding 
sensitivity  terms  may  be  neglected.  In  what  follows,  we  assume  that  this  is  the  case. 


We  now  attempt  to  obtain  an  expression  for  Mspy^p  as  follows: 

"g^i 


I|55.JI?=  1(2 


< 


h?ii 

(25rj)2  /  fy 


[f.',  ■  n. 


g<5„ 

g5i 

Lg^nJ 


dzh  dZy 

Zh  Zy 


2 

dzh. 

dZy 

Zv 

F 

r*  ]  I  ||2  dZh  dZy 

zIIf 


—  trace 


(27rj)2  fi  ^6  j  )G^  (c/i  ,  Cy  )  |c— *-z 


Zh  Zy 

dZh  dZy 


Zh  Zy 


•trace  ^  (cft ,  c„ )  F/  (cft ,  c,, )  |c 


'  Zh  Zy 
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To  get  the  first  inequality,  we  have  used  mutual  consistency  of  Frobenius  norm,  that  is,  <  ||T||f  • 

||B||f,  and  Cauchy-Schwarz  inequality;  last  equality  follows  due  to  ||T|||,  =  trace[A*yl]  [21],  Hence,  using 
(3.15), 

II?  <  trace[P,]  •  trace[^g,^]  =  (A^ •  trace[P<]  •  trace[Qi].  (4.7a) 

Next, 


l|5^ejl? 


1 

(27r;)2 

1 

(27rj)2 


=  p '  trace 


••  gif’ 

2 

d z ^  dzy 

1 

bX) 

ic— 

p||G,(c,,c„)U.|||.^^^ 


2  /  /  G’(c,,c,)G4cft,c„)U,^^^ 

J  Jr-^  Zy 


Hence, 

ll^’isjlz  =  P  ■  trace[^(5,^]  =  pA^A^,  ■  trace[Q«]. 

Similarly,  we  get 

Il'S'icJlz  =  9  •  trace[P,]  ==  qAhA^  ■  trace[Pi], 

and 

II'^^dJI?  =  pq- 


(4.7b) 


(4.7c) 


(4.7d) 


Remark.  In  a  similar  manner,  g-system  counterpart  to  (4.7)  may  be  obtained  as  follows: 

II‘5'9.4,  111  <  trace[Pj  •  trace[Q,]  =  (A/,A„)^  •  trace[P]  ■  trace[,J“^Q,J“^];  (4.8.a) 

ll'S'?B,  II2  =  P  ■  trace[(5J  =  pAftA„  •  trace[^“^(5^“^];  (4.8b) 

ll‘5?c.  Hi  =  9  •  trace[P,]  (?AftAt,  •  trace[P];  (4.8.c) 

Il-S’^ojli  =  P9.  (4.8d) 


Combining  (4.5)  with  (4.7),  we  get  the  following  upper  bound  for  Mbp^p'- 

Mipxp  <  Mbp^p  =  (trace[P,]  +  l)(trace[f(5,^]  +  1) 

=  (A/,A„  •  trace[Pi]  +  l)(AftA„  •  trace[(3^]  +  1). 


Due  to  difficulties  associated  with  minimization  of  Mbp^p,  it  is  customary  to  perform  a  minimization  of 
Mbpxp.  Hence,  one  attempts  to  characterize  those  realization  {Ab,  Bb,Cb,Db}  that  are  ‘bound  optimal’  with 
respect  to  Mbp^p.  For  reasons  of  brevity,  we  do  not  attempt  to  perform  this  since  it  is  exactly  analogous  to 
2-D  (/-operator  case  (see  [15],  and  references  therein).  For  instance,  one  may  show  that  any  realization  that 
is  BL  modulo  an  orthogonal  nonsingular  transformation  is  bound  optimal  with  respect  to  Mbp^p. 

Remark.  In  a  similar  manner,  (/-system  counterpart  to  (4.9)  may  be  shown  to  be. 

-  (ti'ace[P,]  -|-  l)(trace[Q,]  -|-  1) 

=  (A;,A„  ■  trace[P]  +  1)(A/,A„  •  trace[(J“^(5^“‘]  -|- 1).  ^  ^ 
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However,  it  is  instructive  to  note  that,  compared  to  a  g^-operator  implementation,  its  (5-model  imple¬ 
mentation  will  always  yield  a  smaller  whenever 

trace[(5^]  >  trace[^Q^^]  (1  —  A^)  •  trace[Q^^^]  -f  (1  -  A^)  •  trace[Q^^^]  >  0.  (4.11) 

Note  that,  with  local  reachability  and  observability  of  {As,  BsjCs^  Ds}  (and  hence  of  [Ag ,  Bq,Cq,  Dg}), 
positive  definiteness  of  and  (and  hence  of  and  Qq^^)  are  guaranteed.  This  implies  strict 
positivity  of  trace[Q^^^]  and  trace[Q^'^^]  (and  hence  of  tracefQ^^^]  and  trace[(5g^^])-  Thus,  (4.11)  is  satisfied, 
that  is,  6-operator  implementation  has  superior  coefficient  sensitivity,  whenever 

Ah  <  1  and  A^  <  1.  (4.12) 


FLP  Arithmetic  Case 

For  FLP  arithmetic  implementations,  it  is  appropriate  to  define  a  relative  sensitivity  measure  as 

Ms...  =  II?  +  1||5,3^  II?  +  1||5,,^  II?  +  L||5,„^  ||?, 


where 


~~  }  ”  <^6ijdHsldasij  j  ~~  ~  l^SijdHs /dbs-  y 

Ssc,  -  =  cSijdHi/dcs,/,  Ssd,  =  =  ds^.dHs/ddf^.. 


(4.13) 


(4.14) 


Remark,  Note  that,  in  FLP,  possible  perturbation  of  a  particular  coefficient  is  approximately  proportional 
to  its  nominal  value.  Hence,  Li  and  Gevers  [3],  in  addressing  1-D  6-system  coefficient  sensitivity,  utilize  a 
similar  relative  sensitivity  measure. 


Lemma  4.3.  The  following  bounds  hold  true: 

l|5^.J|p<|M^||F-||5i,J|p 

ll-^icjlp  <  I|C'«||f  •  ll'S'scJlp 


ll-^isjlp  =<  II^^IIf  •II-S’^bJIp; 
ll^icjlp  <  ll-OillF  •  Il'S'io  lip. 


Proof,  Note  that. 


p..jiF  =  EEP^^...iiiF  =  EE 


■ 


dHi 


<EEIKII^ 

=  ll^llM|5^.  Ill 


dHe 


da^,. 


<EEiKiirEE 


dH, 


das^ 


Now,  using  Definition  4.1,  one  may  verify  the  claim.  ■ 

Hence,  substituting  from  (3.7),  we  get 

Mw<llr‘(A-^)llF-11^5,,J|?  +  i|r^5,||?,.||^5,,J|?  +  i||C,||^||5,,J|?  +  ^y^  (4.15) 


To  proceed  farther,  let  us  assume  A/^  —  A^  =  A  for  convenience.  Then,  we  get 

lir'(A,  -  /)|||^  =  ^11.4,  -  /II?.;  ||r^S,|||.  =  ^||S,||?.. 


(4.16) 
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Combining  (4.15)  with  (4.16),  we  get  the  foliowing  upper  bound  for 

^ ^FLp  ~  11"^?  ^\\f  '  tr^^^['^^]trace[Q^]  -f-  II^^II^p  ■  trace[Q^]  +  *  trace[i^^]  4- 

Again,  we  perform  a  minimization  of  Mflp- 

Remark,  In  a  similar  manner,  ^-system  counterpart  to  (4.17)  may  be  shown  to  be 

^qpLP  —  ^ qpLP  ~  *  trace[Pg]trace[(55]  II^^IIf  ‘  trace[Q^]  4-  HC^Hf  '  trace[P^]  4"  (4.18) 

Hence,  compared  to  a  g^-operator  implementation,  its  (5-model  implementation  will  yield  a  smaller  M 
whenever 

\\A,-ml<\\A,\\j..  (4.19) 

Clearly, 

■"  ^  ■  -  ■  5  11^9  ~  ^hIIf  <  II^^IIf)  (4.20) 

where  A*  [^5]  denotes  the  z-th  eigenvalue  of  Ag. 

Remark.  Li  and  Gevers  [3]  refer  to  the  above  region  (where  the  eigenvalues  of  Ag  should  lie)  as  the  Middleion- 
Goodwin  (MG)  region.  They  show  that,  for  the  1-D  case,  ^-system  offer  superior  performance  (with  respect 
to  coefficient  sensitivity)  if  the  system  eigenvalues  lie  within  this  MG  region.  It  is  well  known  that,  high 
performance,  high-Q,  narrowband  digital  filters  that  operate  under  high  sampling  rates  routinely  satisfy  this 
requirement.  Hence,  implementation  of  such  filters  via  the  proposed  <5-modei  is  expected  to  offer  significant 
advantages  over  the  conventional  g-model. 
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V.  Example 


To  illustrate  the  notions  presented  above,  we  consider  a  stable  5h-5v  2-D  separable  digital  filter. 
Computations 

All  numerical  values  are  displayed  via  FORMAT  SHORT  E  of  MATLAB  [22]  which  was  used  for  all  com¬ 
putations.  Related  references,  equations,  and  MATLAB  routines  (displayed  in  typewriter  font)  associated 
with  each  result  are  indicated  within  (angle  brackets).  Note  that,  since  system  being  considered  has  =  0 
(instead  of  —  0),  relevant  equations  must  be  appropriately  modified. 


Given  q-model  {Ag,  Bg,Cq,  Dg}.  {(2.1)} 


-  9.7288e-01  2.2120e-01  -1.8087e-01  4.5533e  -  01  -6.6164e-01- 

-4.9620e-02  9.0641e  -  01  5.0270e  -  01  -4.8744e  -  01  9.9640e  -  01 

A\^'^  -  -4.4045e-03  -5.4572e  -  02  8.3446e  -  01  7.4341e  -  01  -7.0769e  -  01  ; 

-8.0077e  -  04  -3.8212e  -  03  -5.3688e  -  02  7.8877e  -  01  8.3955e  -  01 

.-9.1701e-05  -6.1563e-  04  -4.0265e  -  03  -6.6197e  -  02  7.4779e-01. 

-6.5685e-  04  -3.5577e  -  04  6.5445e  -  05  -4.7450e  -  06  7.6323e-08- 

3.7312e  -  04  -2.0156e  -  04  3.7191e  -  05  -2.8504e  -  06  9.4968e  -  08 

A\^'>=  8.3423e-  05  -4.5142e  -  05  8.3131e-06  -6.1509e  -  07  1.3848e-08  ; 

1.0923e-  05  -6.0377e  -  06  1.0848e-06  -4.3584e  -  08  -1.0501e-08 

.  1.3843e  -  06  -8.0173e  -  07  1.3642e  -  07  5.0958e  -  09  -4.8691e  -  09. 

Af)  =  [0]; 

■  9.7561e-01  5.1035e-  02  -4.7944e  -  03  9.6702e  -  04  -1.0621e-04- 

-2.0308e-01  9.1467e-  01  5.7625e  -  02  -4.1281e  -  03  6.5748e  -  04 

=  -1.5349e  -  01  -4.6361e  -  01  8.4368e  -  01  5.4495e  -  02  -3.9295e  -  03  ; 

-4.0166e-01  -4.3093e-  01  -7.0705e  -  01  7.9757e  -  01  6.1499e-02 

.-5.9732e-  01  -9.2922e  -  01  -6.9051e  -  01  -8.3231e  -  01  7.5989e-01. 

=  [4.0636e-  05  2.3637e  -  05  5.2062e  -  06  5.4906e  -  07  3.1533e-08f; 

=  [8.2566e-  01  2.0553e -t- 00  4.2147e-f00  7.7342e  +  00  1.3116e -|- 01  f  ; 

=  [8.8401e-  01  -2.2366e -|- 00  4.6091e -I- 00  -8.4165e  +  00  1.3754e  4- 01] ; 

=  [4.2897e  -  05  -2.3778e  -  05  4.2589e  -  06  -1.5122e  -  07  -4.7764e  -  08] ; 

Dg  =  [2.0931e-06]. 


Gramians  Pg  and  Qg.  Note  that,  Pg^^  —  —  P^  =  qP  =  0.  Computation  of  Pp^ ,  ,  QP  is 

easily  done  using  four  Lyapunov  equations.  ([12],  dlyap) 


pp  = 


p(4)  _ 
?  ~ 


-  2.3413e-04 
1.6529e  -  10 
7.2257e-  11 

-1.9205e-  12 
.-8.7578e-  13 

-  1.5911e-|-01 
-8.0022e  -  04 

3.9261e  -  04 
-2.8748e  -  04 
.  1.1908e-03 


1.6529e  -  10 
2.4873e  -  05 
-3.3583e  -  11 
9.4761e  -  12 
-9.2682e  -  13 
-8.0022e  -  04 
3.1738e-t-01 
-1.8067e  -  03 
2.0229e  -  03 
5.8131e-04 


7.2257e  -  11 
-3.3583e-ll 
8.8928e  -  07 
-7.8387e  -  12 
-8.2826e  -  13 
3.9261e  -  04 
-1.8067e-03 
8.9993e  4-  01 
-1.2184e  -  02 
-4.3646e  -  03 


-1.9205e  -  12 
9.4761e-  12 
-7.8387e  -  12 
1.7156e  -  08 
-2.4931e-  12 
-2.8748e  -  04 
2.0229e  -  03 
-1.2184e-02 
3.2282e  4-  02 
-3.9363e-01 


-8.7578e  -  13- 
-9.2682e  -  13 
-8.2826e  -  13 
-2.493  le  -  12 
3.49016  -  10. 
1.1908e  -  03- 
5.8131e-04 
-4.3646e  -  03 
-3.9363e-  01 
1.1173e4-  03. 
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= 


1.6225e  +  01 

-1.1684e 

-04 

5.0102e 

-05 

2.7371e  -  04 

7.3819e  -04“ 

-1.1684e 

-  04 

3.4255e  +  01 

1.4225e 

-04 

1.4539e  -  05 

2.2515e  -05 

5.0102e 

-05 

1.4225e 

-  04 

1.0392e  +  02 

1.3240e  -  02 

-1.9289e  -02 

3 

2.7371e 

-04 

1.4539e 

-  05 

1.3240e 

-02 

3.8436e  +  02 

7.0539e  -01 

.  7.3819e 

-  04 

2.2515e 

-05 

-1.9289e 

-02 

7.0539e  -  01 

1.2571e  +  03. 

-  2.3806e 

-04 

6.5596e 

-06 

-1.5778e 

-06 

1.9727e  -  07 

-1.3032e-08- 

6.5596e 

-06 

2.7018e 

-05 

1.0571c 

-06 

-1.3366e  -  07 

1.0258e  -  08 

— 

-1.5778e 

-06 

1.0571e 

-06 

1.0828e 

-06 

3.9683e  -  08 

-3.5788e  -  09 

1.9727e 

-  07 

-1.3366e 

-  07 

3.9683e 

-08 

2.2656e  -  08 

7.4571e-  10 

.-1.3032e 

-08 

1.0258e 

-08 

-3.5788e 

-09 

7.4571e  -  10 

4.2699e  -  10. 

Hankel  singular  values  E,.  Use  Laub’s  algorithm  to  simultaneously  diagonalize  the  pairs  and 

([17],  chol,  svd,  dbalreal) 

E(^)  =  diag{  6.1633e  -  02  2.9190e  -  02  9.6133e  -  03  2.5679e  -  03  6.6237e  -  04  }; 

E(-*>  =  diag{6.1613e-  02  2.9240e  -  02  9.6278e  -  03  2.5201e  -  03  6.2671e-04}. 


BL  q-model  {Aqs ,  BqB,CqB ,  Dqs} ■  (dbalreal) 


^qB 


U(2) 

^qB 


4(3) 

^qB 


^qB 


d(1) 

^qB 


-  9.7288e- 

01 

-1.0477e 

-01 

2.8226e 

-02 

l.OlOOe  - 

02 

7.8007e 

-03- 

1.0477e- 

01 

9.0641e 

-01 

1.6563e 

-01 

4.3176e  - 

02 

2.4797e 

-02 

2.8226e- 

02 

-1.6563e 

-01 

8.3445e 

-01 

-1.9981e  - 

01 

-5.3485e 

-02 

-1.9098e- 

02 

4.3169e 

-02 

1.9980e 

-01 

7.8859e  - 

01 

-2.3575e 

-01 

.  7.7946e- 

03 

-2.4778e 

-02 

-5.3426e 

-02 

2.3570e  - 

01 

7.4798e 

-  01. 

-  L6387e- 

01 

-1.9665e 

-01 

1.3359e 

-01 

-6.4934e  - 

02 

2.5945e 

-  02- 

-1.9656e  - 

01 

2.3525e 

-01 

-1.6009e 

-01 

7.9605e  - 

02 

-3.3830e 

-  02 

™1.3338e  - 

01 

1.5991e 

-01 

-1.0869e 

-01 

5.3270e  - 

02 

-2.1779e 

-  02 

6.4945e  - 

02 

-7.9570e 

-02 

5.3310e 

-02 

-2.1310e  - 

02 

3.3012e 

-03 

.-2.9278e™ 

nl  ’ 

02 

3.7620e 

-02 

-2.4427e 

-02 

4.8708e  - 

03 

5.9844e 

-03. 

-  9.7288e- 

01 

1.0493e 

-01 

-2.8224e 

-02 

1.9068e  - 

02 

-6.9404e 

-03- 

-1.0493e- 

01 

9.0655e 

-01 

1.6553e 

-01 

-4.3176e  - 

02 

2.2040e 

-02 

-2.8225e  - 

02 

-1.6553e 

-01 

8.3466e 

-01 

1.9846e  - 

01 

— 4.7565e 

-  02 

-1.9070e- 

02 

-4.3183e 

-02 

-1.9847e 

-01 

7.8716e  - 

01 

2.1543e 

-  01 

_-6.9480e- 

03 

-2.2064e 

-02 

-4.7636e 

-02 

-2.1550e  - 

01 

7.9017e 

-  01. 

B 

c: 

c[ 


(2)  _ 
qB  - 

(1)  _ 
qB  - 


(2)  _ 


qB 

DqB 


[6.5931e-  04  -8.0973e  -  04  -5.4131e  -  04  2.1245e-  04  -4.3537e  -  05]^ 

[5.4458e-  02  6.5289e  -  02  4.4379e  -  02  2.1755e  -  02  8.9064e-03]; 
[5.4485e-  02  6.5289e  -  02  -4.4331e  -  02  -2.1760e  -  02  -9.99346-03]; 
[6.5904e-  04  -8.0975e  -  04  5.4156e  -  04  -2.0980e  -  04  2.3502e-05]; 
[2.0931c -06]. 


Corresponding  S-model  {Af,  B^,Ci,Di].  In  FXP,  selection  of ‘sampling  times’  A/,  and  A„  must  be  carefully 
done  because  they  directly  influence  the  number  of  bits  required  for  integral  and  fractional  portions  of  each 
coefficient.  Actually,  the  relationships  developed  in  Section  3  can  be  invaluable  in  determining  suitable  A/, 
and  A„  so  that  the  range  of  coefficient  values  of  are  acceptable.  Let  us  select 

Ah  =  5.0000e-01;  A„  =  2.5000e  -  01. 
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Note  that,  these  are  exactly  representable  (see  remarks  after  (4.6))  and  resulting  coefficients  of  BL  ^-model 
are  each  within  [—1, 1]. 

Of  course,  in  FLP,  such  a  difficulty  does  not  usually  arise  because  of  the  large  dynamic  range  available. 
Hence,  although  one  may  choose  smaller  Ah  and/or  we  will  continue  to  use  the  same  values. 

Accordingly,  we  get  the  following  ^-model:  ((3.5),  (3.7)) 


-5.4240e  -  02 

4.4240e  -  01 

-3.6174e- 

01  9.1066e  - 

01 

-1.3233e  +  00 

-9.9240e  -  02 

-1.8718e-01 

1.0054e  +  00  -9.7488e- 

01 

1.9928e  +  00 

-8.8090e  -  03 

-1.0914e-01 

-3.3108e- 

01  1.4868e+00 

-1.4154e  +  00 

-1.6015e-03 

-7.6424e  -  03 

-1.0738e- 

01  -4.2246e  - 

01 

1.6791e  +  00 

-1.8340e-04 

-1.2313e-03 

-8.0530e  - 

03  -1.3239e- 

01 

-5.0442e  - 

01 

1.3137e  -  03 

-7.1154e-04 

1.3089e  -  04 

-9.4900e  -  06 

1.5265e-07- 

7.4624e  -  04 

-4.0312e  -  04 

7.4382e  -  05 

-5.7008e  -  06 

1.8994e  -07 

1.6685e-04 

-9.0284e  -  05 

1.6626e  -  05 

-1.2302e-06 

2.7696e  -  08 

1 

2.1846e  -  05 

-1.2075e-05 

2.1696e-06 

-8.7168e  -  08 

— 

2.1002e  -  08 

2.7686e  -  06 

-1.6035e  -  06 

2.7284e  -  07 

1.0192e-  08 

— 

9.7382e-09. 

[0]; 

--9.7560e-02  2.0414e  -  01 

-8.1232e-01  -3.4132e-01 
-6.1396e-01  -1.8544e  +  00 
-1.6066e+00  -1.7237e  +  00 
.-2.3893e  +  00  -3.7169e  +  00 


-1.9178e-02 
2.3050e  -  01 


1.8544e  +  00  -6.2528e  -  01 
1.7237e  +  00  -2.8282e  +  00 


3.8681e  -  03 
-1.6512e  -02 
2.1798e  -  01 
-8.0972e  -  01 


-4.2484e 

2.6299e 

-1.5718e 

2.4600e 


L-2.3893e  +  00  -3.7169e  +  00  -2.7620e  +  00  -3.3292e  +  00  -9.6044e - 
[8.1272e-05  4.7274e  -  05  1.0412e-  05  1.0981e  -  06  6.3066e  -  08]^  ; 

[3.3026e  +  00  8.2212e  +  00  1.6859e  +  01  3.0937e  +  01  5.2464e  +  01  ]^  ; 
[8.8401e-01  -2.2366e  +  00  4.6091e  +  00  -8.4165e  +  00  1.3754e  +  01]; 

[4.2897e-05  -2.3778e  -  05  4.2589e  -  06  -1.5122e  -  07  -4.7764e  -  08] : 
[2.0931e-06]. 


BL  6-model  {A(BjB}£,Csb,L)(b}-  (Section  III) 


--5.4240e-02 
2.0953e  -  01 
5.6452e  -  02 
-3.8196e  -  02 
.  1.5589e-02 
■  4.6349e-01 
-5.5595e  -  01 
-3.7724e  -  01 
1.8369e-01 
.-8.2812e-02 

0]; 

•-1.0847e-01 
-4.1972e-01 
-1.1290e-01 
-7.6279e  -  02 
.-2.7792e-02 


-2.0953e-01 
-1.8718e-01 
-3.3126e  -01 
8.6339e  -  02 
-4.9556e  -  02 
-5.5622e  -01 
6.6538e  -  01 
4.5228e  -  01 
-2.2506e  -01 
1.0641e-01 

4.1972e  -01 
-3.7381e-01 
-6.6214e  -01 
-1.7273e  -01 
-8.8254e  -  02 


5.6452e  -  02 
3.3126e  -01 
-3.3110e  -01 
3.9960e  -  01 
-1.0685e  -01 
3.7784e  -  01 
-4.5280e  -  01 
-3.0743e  -  01 
1.5078e  -  01 
-6.9089e  -  02 


3.8199e 
8.6351e 
-3.9962e 
-4.2283e 
4.7139e 
-1.8366e 
2.2516e 
1.5067e 
-6.0274e  ■ 
1.3777e  • 


-1.1290e  -01 
6.6214e  -  01 
-6.6135e  -01 
-7.9389e  -  01 
-1.9054e  -01 


[9.3241e-04  -1.1451e-  03  -7.6552e 


I  7.6273e  - 
[  -1.7271e- 

1  7.9384e  - 

1  -8.5135e  - 

i  -8.6201e  - 

3.0045e  -  04 


1.5601e  -02- 
4.9595e  -  02 
-1.0697e  -01 
-4.7150e  -01 
-5.0404e  -01. 

7.3384e  -02' 
-9.5685e  -  02 
-6.1600e  -  02 
9.3373e  -  03 
1.6926e  -02. 

-2.7762e  -  02' 
8.8160e  -02 
-1.9026e  -01 
8.6171e  -01 
-8.3933e  -  01. 

ic'rn.  nr  "[T  _ 


-6.1570e 
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11 

[1.0892e 

-01 

1.3058e-01 

8.8758e-  02  4.3511e-  02  1.7813e  -  02]^  ; 

11 

[7.7053e 

-02 

9.2333e  -  02 

-6.2693e-  02  -3.0773e  -  02  -1.4133e-02 

11 

[1.3181e 

-03 

-1.6195e-03 

1.0831e-  03  -4.1961e-  04  4.7004e-05]; 

Dsb  =[2.093l€-06]. 


Simulations 

Normalized  frequency  response  of  {Ag,Bq^Cq,  Dq}  is 

H,  =  C,{h  - 


22=e-'‘-'2 


whereas  normalized  frequency  response  of  D^]  is 

Hs  -  1)/Ah.(e^“=  -  1)/A,)  =  Q(/,  -  AsY^B^  + 

Frequency  responses  are  evaluated  on  the  following  grid: 

=  {(wi,W2)  :uji  =  niX  m  =  [-/V  ;  1  :  N], 


C2=(eJ'*'2-l)/Av 


^*=1,2}, 


and  we  selected  N  =  32. 


For  comparison  purposes,  the  following  measure  was  also  evaluated: 


Err 


max^2  H  -  H  |  , 

maxg2  H  ((e-'“‘  -  l)/Ah,{e^‘^^  -  1)/A„)  -  H 


l)/A,,(eJ-=-l)/A,) 


for  ^-models; 
for  5-models. 


Here,  H  denotes  the  hdeaF  frequency  response  where  each  coefficient  is  represented  in  'infinite’  precision.  H 
denotes  the  'actual’  frequency  response  where  each  coefficient  is  represented  in  finite  precision. 

Fig.  (1).  Plot  shows  the  ideal  frequency  response.  Note  that,  in  'infinite’  precision,  all  realizations  give 
identical  results. 


Fig.  (2).  Here,  each  coefficient  is  represented  in  FXP  and  its  fractional  part  is  truncated  at  different 
lengths.  Integral  part  is  represented  exactly.  Plot  shows  E'max  versus  number  of  fractional  bits. 

Remarks. 

1.  Advantage  gained  by  BL  model  over  given  model  is  6-7  bits. 

2.  Advantage  gained  by  5-model  over  its  corresponding  ^-model  is  only  1  bit. 

3.  A  small  tracefQJ  implies  trace[Q^]-f  1  trace Hence,  from  (4.9-10),  no  dramatic  improvement 
in  5-model  can  be  expected.  This  explains  the  modest  gains  in  item  2  for  this  particular  example. 

Fig.  (3).  Here,  each  coefficient  is  represented  in  FXP  and  its  total  (integral-f  fractional)  number  of  bits 
is  truncated  at  different  lengths.  Plot  shows  A’max  versus  total  number  of  bits. 

Remark. 

1.  This  comparison  is  more  realistic  than  what  is  in  Fig.  (2). 

2.  Only  the  BL  realizations  are  shown;  given  systems’  {q  and  5)  dynamic  range  are  too  large. 
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3.  Advantage  gained  by  <5-model  over  its  corresponding  ^-model  is  1-2  bits. 

4.  This  modest  improvement  is  due  to  large  and  Ay  being  used.  More  dramatic  improvements  require 
smaller  Ah  and  A*;  (see  (4.11)).  However,  this  makes  (5-model’s  coefficients  to  occupy  a  higher  dynamic 
range.  To  circumvent  this  difficulty,  careful  scaling  of  filter  coefficients  must  be  performed.  This  is  a 
research  topic  we  are  currently  tackling. 

(4)-  Here,  each  coefficient  is  represented  in  FLP  and  its  number  of  mantissa  bits  is  truncated  at 
differene  lengths.  Plot  shows  F7max  versus  number  of  mantissa  bits. 

Remarks. 

1.  No  apparent  advantage  gained  by  using  corresponding  BL  model. 

2.  However,  (5-models  (BL  or  not)  provide  consistently  better  results  with  advantages  of  3-4  bits. 

3.  Note  that,  \\Ag  ~  /io||f  =  2.7904e  -f  00  <  3.8561e  +  00  =  |K||f,  WA^b  -  /io||f  =  1.0502e  -h  00  < 

2.8612e  -h  00  =  ||A^f||f,  and  \\AsB2g  -  hoWr  =  1.1805e  00  <  2.9115e  -f  00  :=  \\AsB2g\\F^  This 

explains  the  significant  improvements  shown  by  (5-models.  For  the  particular  example  being  considered, 
these  differences  between  the  two  sides  are  not  very  high;  if  they  were,  more  dramatic  would  be  the 
improvement  shown  by  the  corresponding  (5-model. 
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VL  Conclusion  and  Final  Remarks 


In  this  paper,  we  have  developed  a  <5-operator  based  counterpart  to  the  more  conventional  g-opercitor  based 
Roesser  local  s,s.  model.  The  motivation  for  this  work  lies  in  the  superior  finite  wordlength  properties 
exhibited  by  1-D  ^-operator  based  DT  systems. 

Corresponding  notions  of  gramians  and  BL  realization  are  proposed.  By  revealing  the  relationship 
between  BL  realizations  of  corresponding  8-  and  g~models,  we  have  addressed  computation  of  gramians 
and  BL  realizations  as  well.  For  both  FXP  and  FLP  implementations,  conditions  under  which  proposed 
^-operator  formulated  systems  behave  better — with  respect  to  coefficient  sensitivity — than  its  ^-operator 
counterpart  are  derived. 

In  the  FXP  case,  6-model  is  better  whenever  A/,  <  1  and  <  1.  However,  this  choice  must  be  carefully 
done  since,  in  FXP,  6-models  tend  to  occupy  a  larger  dynamic  range.  The  authors  are  currently  investigating 
the  possibility  of  incorporating  scaling  of  coefficients  so  that  low  values  of  A/^  and  A^,  may  be  used  to  expose 
and  exploit  the  advantages  of  6-systems. 

In  the  FLP  case,  such  a  limitation  regarding  dynamic  range  does  not  usually  arise,  and  6-models  are  bet¬ 
ter  whenever  the  system  matrix  eigenvalues  lie  within  the  MG-region.  This  condition  is  typically  true  for  high 
Q,  narrowband  digital  filters  operating  under  high  sampling  rates.  We  believe  that,  under  these  conditions, 
the  proposed  6-models  can  yield  significantly  superior  performance.  In  FLP,  for  comparative  performance 
(with  respect  to  coefficient  sensitivity),  6- models  require  a  shorter  mantissa  length.  The  ensuing  implications 
regarding  low  power  consumption,  low  cost  and  weight,  and  high  speed  cannot  be  overemphasized. 

This  work  only  addresses  coefficient  sensitivity  issues.  The  authors  are  currently  completing  work 
regarding  quantization  noise  properties  of  the  6-model  developed,  where,  as  in  1-D  case,  improvements  over 
the  corresponding  g-model  are  expected. 

We  must  mention  that  certain  difficulties  regarding  limit  cycles  are  inherent  in  6-systems  when  FXP 
arithmetic  is  used  [23].  However,  this  problem  is,  for  all  practical  purposes,  nonexistent  in  FLP  arithmetic. 
Hence,  in  our  opinion,  for  FLP  high  performance  applications,  the  6-model  developed  provides  an  extremely 
attractive  solution  that  avoids  numerical  ill-conditioning  typically  associated  with  high  speed  g-systems. 
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Figure  (1) 


FXP:  E_max  Vs  Fractional  number  of  bits  per  coefficient 


Figure  (2) 


FXP:  E_max  Vs  Total  number  of  bits  per  coefficient 
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Abstract —  In  this  paper,  the  convergence 
properties  of  linearly  stable  multi-dimensional 
systems  are  investigated  for  the  case  of  delta- 
operator  implementations  in  fixed- point  format. 
It  is  shown  that  zero-convergence  is  almost  never 
achieved,  if  the  sampling  time  is  small.  Using  a 
one-dimensional  analysis,  it  is  demonstrated  that 
zero- convergence  cannot  be  guaranteed  along  the 
axis  of  the  first  hyper- quadrant  for  a  first  hyper¬ 
quadrant  causal  system.  This  limits  the  use 
of  delta-operators  for  solving  partial  differential 
equations  in  discrete  time  with  fixed-point  arith¬ 
metic. 


L  INTRODUCTION 

Delta-operator  (or,  6-operator)  implementations  of 
discrete-time  systems  have  been  the  topic  of  a  number 
of  research  papers  within  the  last  decade.  A  compre¬ 
hensive  treatment  of  the  properties  of  6-operator  imple¬ 
mentations  can  be  found  in  [1].  It  is  well  known  that 
6-operators  outperform  shift-operators  (or,  g-operators) 
in  terms  of  their  finite  wordlength  properties  [2].  In  par¬ 
ticular,  its  quantization  noise  and  sensitivity  pi'operties 
make  the  6-operator  an  interesting  alternative  to  the  q- 
operator  in  areas  such  as  digital  control,  digital  signal 
processing,  and  generally  discrete-time  simulation  of  dy¬ 
namical  systems  described  by  differential  equations  [1], 

[3]- 

In  this  paper,  we  will  perform  a  deterministic 
analysis  of  the  finite  wordlength  properties  of  multi¬ 
dimensional  (m-D)  6-operator  implemented  discrete¬ 
time  systems.  In  particular,  we  will  investigate  the  zero- 
convergence  of  6-operator  fixed-point  implementations  of 
one- dimensional  (1-D)  and  m-D  systems.  Although  it  is 
of  vital  importance,  this  problem  has  not  been  investi¬ 
gated  thus  far  in  the  literature.  After  all,  asymptotic 
stability  and  convergence  to  the  true  equilibrium  points 
are  some  of  the  most  fundamental  requirements  for  any 
discrete-time  system  realization. 

This  article  is  organized  in  the  following  way:  Sec¬ 
tion  II  introduces  the  notation.  The  m-D  6-operator 
model  will  be  introduced  and  briefly  discussed.  This 
section  will  also  provide  the  problem  formulation.  Sec¬ 
tion  III  provides  necessary  1-D  stability  conditions  for 
m-D  first  hyper-quadrant  causal  systems  with  nonlin¬ 


earities.  Using  these  necessary  conditions,  section  IV 
provides  a  stability  and  convergence  analysis  for  m-D 
systems.  It  will  be  shown  that  the  resulting  1-D  systems 
cannot  ensure  zero- convergence.  Section  V  contains  con¬ 
cluding  remarks. 


II.  NOTATION  AND  PROBLEM  FORMULATION 

The  m-D  Roesser  model  has  the  following  6- 
operator  formulation  [4]: 


-  6(i)[x(i)](n)  - 

r 

.  •  1 

1 

_ 1 

1 - 

o> 

? 

? 

1 _ 

.x('”)(n). 

r^fi 
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u(n); 


(1) 


_g(m)[x(m)](n)_ 

f>6 

x(i)(n) 

Lx(’")(n)  J 


+  A 


i5(i)[x(^)](n) 


(2) 


L6(™)[x<:”‘)](n)J 


The  input-state  equations  in  (1)  and  (2)  describe  a  first 
hyper-quadrant  causal  m-D  system  with  a  uniform  sam¬ 
pling  period  of  A  in  all  directions.  The  operators 
and  6t*)  represent  the  shift-  and  delta-operator  in  the 
direction  specified  by  the  axis  In  particular 


9(')[x(->](n) 

=  x*^‘^(ni , . . .  ,  ni_i ,  n;  -h  1,  n^+i  (3a) 

6(‘)[x('>](n) 


(x^  ^  )  •  •  •  j  —1  ,  +  1 ,  ,  .  .  .  ,  Tim  ) 


-x(*)(n)). 


(3b) 


Here,  (n)  =  (ni,...,nTn)  denotes  a  point  in  the  first 
hyper-quandrant,  x(*)(n)  is  the  portion  of  the  state  vec¬ 
tor  propagating  in  the  direction  specified  by  the  axis  n,* , 
u(n)  is  the  m-D  input  vector,  and  Afj  and  ,  fori  = 
1, . . . ,  m,  j  ==  1, . . , ,  m,  are  the  submatrices  of  the  sys¬ 
tem  and  input  matrices,  respectively. 

If  (1)  is  realized  in  fixed-point  arithmetic,  it  takes 
the  following  form  under  zero- input  conditions: 

^(i)[x(i)](n) 


6(i)[x(i)](n)J 

r^n 


r  x^^^(n) 


(4) 


•••  Lx(-)(n)J 

where  Q{x}  =1  ;  1  with  x  = 

\Q{xm}  /  \Xr 

Equation  (4)  assumes  quantization  after  summa¬ 
tion;  since  practically  all  modern  DSP  machines  imple¬ 
ment  this  quantization  scheme,  we  utilize  this.  The 
vector- valued  quantization  nonlinearity  Q{  }  may  rep¬ 
resent  any  one  of  the  conventional  schemes,  viz.,  magni¬ 
tude  truncation,  magnitude  rounding,  two’s  complement 
truncation,  and  two’s  complement  rounding. 

Equation  (2)  can  be  implemented  in  two  different 
forms: 


)[x(‘)](n)  ■ 

)[x(ni)](n)_ 

'  x(’^)(n)  ■ 
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+  Q  <  A  • 

.x(’")(n). 
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,5(m)[x(m)](n)_ 
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(5(^)[x^^)](n) 
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Equation  (5)  corresponds  to  quantization  after  multi¬ 
plication,  whereas  (6)  corresponds  to  quantization  after 
addition.  In  contrast  to  (1),  for  (2),  it  is  not  obvious 
which  of  the  two  forms  stated  above  is  preferable. 

The  following  definition  for  asymptotic  stability  [5] 
will  be  used  throughout  this  paper. 

Definition.  An  m-D  first  hyper-quadrant  causal  discrete¬ 
time  system  is  asymptotically  stable  under  all  finitely 
extended  bounded  input  signals  ti(n)  where 

h(n)l  <  -5,  for  ni  H - T  <  D;  (7) 

t/(n)  =  0,  for  m  >  D,  (8) 

if  all  the  states  of  the  m-D  discrete- time  system  asymp¬ 
totically  reach  zero  for  ni  -f-  ♦  •  •  +  nm  oo.  Here, 
z/  =  l,...,m,5isa  nonnegative  real  number, 
and  D  is  a  positive  integer. 

Since  the  fixed-point  systems  considered  are  in  fact 
finite  state  machines,  the  condition 

/  x(i>(n)  \ 

\x(”^)(n)  / 

for  ni  -f-  ■  •  •  -f-  Um  — ^  oo,  >  0,  z/  =  1, ,  . .  ,  m,  can  be 
strengthened  to 

/x(i)(n)\ 

\x(’^)(n)  / 

for  all  points  ni  H - 1-  >  c,  >  0,  z/  =  1,  .  .  . ,  m, 

where  c  is  some  finite  integer. 

Problem  Formulation.  Analyze  the  asymptotic  zero- 
convergence  of  the  state  response  of  systems  in  (4,5) 
and  (4,6)  under  the  assumption  that  the  underlying  lin¬ 
ear  system  is  asymptotically  stable. 

III.  NECESSARY  CONDITIONS  FOR 
GLOBAL  ASYMPTOTIC  STABILITY 
OF  m-D  SYSTEMS 

In  this  section,  we  present  some  necessary  condi¬ 
tions  for  stability  of  a  first  hyper-quadrant  causal  m- 
D  discrete-time  system  represented  in  its  Roesser  local 
state-space  model  in  (1,2).  These  necessary  conditions 
are  formulated  in  terms  of  1-D  conditions.  This  theorem 
follows  directly  from  a  result  in  [6]  which  was  formulated 
for  g-operator  implemented  discrete-time  systems.  The 
proof  of  the  theorem  rests  on  the  fact  that  a  first  hyper¬ 
quadrant  m-D  system  can  be  described  by  a  1-D  system 
for  those  locations  that  are  along  the  m  coordinate  axes 
of  the  boundary  of  the  hyper-quadrant.  Reformulating 
the  result  in  [6]  for  <5-operator  systems  produces  the  fol¬ 
lowing  theorem: 


! 


Theorem  1. 

(a)  A  necessary  condition  for  global  asymptotic  sta¬ 
bility  of  the  system  in  (4,5)  is  that  each  of  the  following 
1-D  systems  in  (9,10)  is  globally  asymptotically  stable: 


»<’■>(*<'))(»,)  =  x<')(n,)  +  Q  {  A  .  (10) 

where  f  =  1, .. .  ,m. 

(b)  A  necessary  condition  for  global  asymptotic  sta¬ 
bility  of  the  system  in  (4,6)  is  that  each  of  the  following 
in  1-D  systems  in  (11,12)  is  globally  asymptotically  sta¬ 
ble: 


Now,  we  are  in  a  position  to  formulate  the  second  theo¬ 
rem  which  presents  a  necessary  condition  for  stability  of 
1-D  systems. 

Theorem  2.  A  necessary  condition  for  global  asymptotic 
stability  of  the  system  in  (13,14)  or  (13,15)  is  given  by 

A  >  0.5,  for  magnitude  rounding; 

^  ^  Ij  for  truncating. 

Proof.  For  global  asymptotic  stability  of  (13,14),  it  is 
necessary  that 


«<‘)(*(‘)l(„,)  =  Q{[A<.-JxW(„,-)};  (11) 

5<‘>(x<'')](ni)  =  Q  {x(‘)(n,.)  +  A  ■  «<’)[x<')](n,.)}  (12) 
where  i  =  1, . . . ,  m. 


Q  <  A 


5[a;i](n) 

^[2>K](n)  J 


7^0, 


(16) 


Proof.  For  a  detailed  proof,  and  generalizations  to  higher 
sub-dimensional  systems,  the  reader  is  referred  to  [6],  ■ 

Theorem  1  can  be  viewed  as  an  extension  of  the 
concept  of  practical  BIBO  stability  to  asymptotic  sta¬ 
bility  of  nonlinear  systems.  It  is  particularly  useful  in 
proving  instability  in  m-D  nonlinear  systems. 


®i(n) 


for  any 


^  0. 


mg 


First,  we  will  address  the  case  of  magnitude  round- 
Obviously,  condition  (16)  is  violated  if,  for  0, 


IV.  NECESSARY  CONDITIONS  FOR 
GLOBAL  ASYMPTOTIC  STABILITY 
OF  1-D  SYSTEMS 


Let  us  rewrite  (9),  (10),  and  (12)  as  1-D  matrix 
equations  of  order  K.  In  this  case,  (9),  (10),  and  (12) 
yield  (13),  (14),  and  (15),  respectively: 


"■XK 


-KK  J 


Xi{n) 

xii{n) 


>  ;(i3) 


XK(n  +  l) 


■  a:i(n)  ' 

+  Q  i 

^[Ki](n)  ■ 

_XK(n)  _ 

i 

J 

»l(n  +  1) 


(14) 


£ 

|A-<5[a;^](n)|  <  -,  for  =  1, .  . . , 7^:,  (17) 

where  £  is  the  quantization  step.  Expressing  the  sam¬ 
pling  time  A  as  an  integer  multiple  of  we  have 

(18) 

where  I  is  some  (typically  small)  positive  integer. 
With  (17)  and  (18),  we  obtain  the  following  condition 
for  instability: 


|^[a;i;](n)|  <  — ,  ~  1,  .  . .  ,  m,  (19) 

for  x*.  9*^  0,  1/  1,  . .  .  ,m. 

Condition  (19)  is  not  satisfied  for  any  nonzero  value 
oixi,  (that  is,  the  condition  for  instability  is  not  satisfied) 
if  ^  >  1/27,  or  equivalently, 


(20) 


This  proves  the  theorem  for  magnitude  rounding. 

For  the  case  of  magnitude  truncating,  (17)  takes 
the  form 


lxK{n  -h  1) 


■  xi(n)  ■ 

'  %i](n)  ■ 

+  A- 

i 

_XK(n)  _ 

1 

1. 

.  (15) 


|A  •5[x„](n)|  <  £,  for  u=l,...,K.  (21) 

Therefore,  (19)  becomes 

|<5[x,;](n)|  <  / 


(22) 


This  finally  yields 


V.  CONCLUSION 


A  >  1.  (23) 

For  two’s  complement,  (17)  takes  the  form 
0  <  A  •  6[xi/](n)  <  £,  for  (24) 

This  results  in 

0  <  6[a:,;](n)  <  J,  (25) 

and  consequently,  A  >  1.  This  proves  the  theorem  for 
the  system  in  (13,14).  A  similar  argument  can  be  used 
for  the  system  in  (13,15)  by  considering  the  cases  for 
which 


^\{n) 

\_XK{n) 

=  Q 

for  nonzero  state  vectors. 


+  A 


<5[a;i](r 

L5[a;/c](n)  J 

x\{n) 


XKin)  J 


(26) 


We  can  now  combine  Theorems  1  and  2  to  formu¬ 
late  a  necessary  condition  for  stability  of  m-D  first  hyper- 
quadrant  causal  ^-operator  formulations  of  the  general¬ 
ized  Roesser  model. 


Corollary  3.  A  necessary  condition  for  global  asymptotic 
stability  of  the  m-D  systems  in  (4,5)  or  (4,6)  is 

A  >  0.5,  for  magnitude  rounding; 

A  >  1,  for  truncatiing. 

Proof.  The  proof  follows  from  Theorems  1  and  2.  ■ 

Comments. 

1.  Theorem  2  and  Corollary  3  are  also  essentially  ap¬ 
plicable  to  the  case  where  the  sampling  time  varies 
with  the  direction  of  propagation.  In  this  case,  the 
inequalities  in  Theorem  2  and  Corollary  3  would 
have  to  be  replaced  by 

A*  >  0.5,  for  magnitude  rounding; 

Ai  >  1,  for  truncating, 
for  z  =  1, . .  .  ,  m. 

2.  Most  of  the  previous  results  on  the  superior  fi¬ 
nite  wordlength  properties  of  (5-operators  depend 
on  choosing  a  very  small  sampling  time  A.  In  such 
a  case,  Theorem  2  and  Corollary  3  show  that  the 
system  response  will  not  converge  to  zero  for  the 
unforced  case. 

3.  Our  analysis  is  limited  to  the  zero-input  case  for 
which  DC  limit  cycles  were  used  to  derive  condi¬ 
tions  for  non-convergence.  If  one  includes  other 
types  of  limit  cycles  in  the  analysis,  the  require¬ 
ments  for  A  may  become  even  more  severe. 

4.  Theorem  2  and  Corollary  3  show  that  fixed-point 
implementations  of  1-D  and  m-D  6-operator  sys¬ 
tems  cannot  be  realized  limit  cycle  free,  if  good  coef¬ 
ficient  sensitivity  and  quantization  noise  measures 
have  to  be  achieved.  See  also  [7]. 


In  this  paper,  it  was  shown  that  fixed-point  imple¬ 
mentations  of  1-D  and  m-D  6-operator  systems  are  not 
limit  cycle  free  even  if  the  underlying  linear  system  is 
stable  and  the  sampling  time  is  chosen  small.  This  non- 
convergent  behavior  can  be  explained  by  the  quantiza¬ 
tion  of  the  6-term  to  zero  which  leaves  the  state  vector 
unchanged.  The  smaller  the  sampling  time,  the  more 
severe  this  effect  is.  Therefore,  the  practical  value  of 
6-operators  for  fixed-point  implementations  of  1-D  and 
m-D  systems  is  questionable.  There  are  however  indica¬ 
tions  that  this  effect  is  much  less  severe  in  floating-point 
implementations. 

6-operator  implemented  discrete-time  systems  rep¬ 
resent  a  class  of  systems  where  the  quantization  noise 
at  the  output  can  be  small  compared  to  other  realiza¬ 
tions.  However,  as  was  shown  above,  such  realizations 
will  invariably  exhibit  limit  cycle,  that  is,  highly  cor¬ 
related  quantization  noise,  behavior.  Therefore,  in  this 
case,  typical  measures  for  quantization  noise  are  of  very 
limited  use  for  obtaining  any  insight  into  the  likelihood 
of  limit  cycles  and  vice  versa. 
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ABSTRACT 

This  paper  analyzes  the  problem  of  global  asymptotic 
stability  of  delta-operator  formulated  discrete-time  sys¬ 
tems  implemented  in  fixed-point  arithmetic.  It  is  shown 
that  the  free  response  of  such  a  system  tends  to  pro¬ 
duce  period  one  limit  cycles  if  conventional  quantization 
arithmetic  schemes  are  used.  Explicit  necessary  con¬ 
ditions  for  global  asymptotic  stability  are  derived,  and 
these  demonstrate  that,  in  almost  all  cases,  fixed-point 
arithmetic  does  not  allow  for  global  asymptotic  stability 
in  delta-operator  formulated  discrete-time  systems  that 
use  a  short  sampling  time. 


I.  INTRODUCTION 

Recently,  discrete-time  systems  formulated  with  the  in¬ 
cremental  difference  operator  (or,  6-operator)  have  been 
receiving  considerable  attention  in  the  technical  litera¬ 
ture  [1-4].  Most  of  this  work  focus  on  its  superior  per¬ 
formance  under  finite  wordlength  conditions  when  com¬ 
pared  with  those  formulated  with  the  shift-operator  (or, 
^-operator).  In  particular,  investigations  of  coefficient 
sensitivity  and  quantization  noise  properties  have  re¬ 
vealed  that  6-operator  formulations  usually  perform  sig¬ 
nificantly  better  than  their  ^-operator  counterparts  [1- 
4].  This  is  especially  true  for  high-speed  applications 
where  the  sampling  rate  is  much  larger  than  the  un¬ 
derlying  system  bandwidth.  Under  these  conditions,  q- 
operator  formulated  discrete-time  systems  tend  to  be¬ 
come  ill-conditioned  [1-2]. 

Although  a  large  amount  of  work  is  available  on  the 
effects  of  coefficientsensitivity  and  quantization  noise,  a 
deterministic  study  of  the  nonlinear  behavior  of  discrete- 
time  systems  formulated  with  the  6-operator  has  not 
been  undertaken.  In  the  case  of  floating-point  (FLP) 
arithmetic,  some  results  for  feedback  system  are  avail¬ 
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able  in  [2]. 

In  this  work,  we  focus  on  the  convergence  behavior  of  the 
unforced  system  response  and  global  asymptotic  stabil¬ 
ity  of  6-operator  formulated  discrete-time  systems  imple¬ 
mented  in  fixed-point  (FXP)  arithmetic.  In  particular, 
via  necessary  conditions  for  stability,  it  will  be  shown 
that  such  systems  tend  to  produce  DC  limit  cycles. 

The  structure  of  this  article  is  as  follows:  In  Section  II, 
we  introduce  notation  and  nomenclature.  The  model  for 
6-operator  formulated  discrete-time  systems,  with  and 
without  quantization  nonlinearities,  is  briefly  discussed. 
Section  III  addresses  the  problem  of  asymptotic  stability 
when  FXP  arithmetic  is  used  for  the  implementation. 
In  terms  of  ensuing  DC  limit  cycles,  necessary  condi¬ 
tions  for  global  asymptotic  stability  are  formulated.  It 
is  shown  that,  when  FXP  arithmetic  is  used,  stability 
of  the  linear  system  is  often  lost.  Section  IV  provides 
concluding  remarks. 

II.  NOTATION  AND  NOMENCLATURE 

Since  our  focus  is  on  investigation  of  stability  proper¬ 
ties  of  6-operator  formicated  discrete- time  systems  un¬ 
der  unforced  conditions,  the  state  equations  of  the  sys¬ 
tem  under  zero-input  will  be  considered. 

In  the  linear  case,  the  general  m-th  order  state-space 
representation  is  given  by 

^[x](n)  =  A^x{n)-,  (1) 

x(n  +  1)  =  x(n)  +  A  •  (5[x](n),  (2) 

where  x(n)  =:  [a:i(7T), . . . ,  Xrn(n)]^  is  the  state  vector  at 
instant  n,  =  {af,}  G  is  the  system  matrix, 


and  A  >  0  is  the  sampling  time.  Moreover,  <5[*]  repre¬ 
sents  the  (^-operator,  that  is, 


III,  NECESSARY  CONDITIONS 
FOR  STABILITY 


Xi,{n  +  1)  —  Xi,(n) 

A 


Vz/  rr  1,  .  ,  .  ,  m, 


(3) 


and  (5[x](n)  =  [6[xi]{n),  . . .  ,  6[xm]in)]'^ .  The  actual  im¬ 
plementation  of  (l)  and  (2)  in  FXP  format  gives  rise  to 
nonlinear  quantization  operations  that  occur  at  various 
locations  depending  on  the  hardware  realization. 

Eqn.  (1)  can  be  implemented  either  by  using  single 
woidlength  accumulators  (creating  a  quantization  error 
after  each  multiplication)  or  by  using  double  wordlength 
accumulators  (creating  a  quantization  error  only  after 
summation).  We  will  only  consider  the  latter  option 
since  practically  all  modern  DSP  machines  implement 
this.  Eqn.  (1)  can  then  be  written  as 


First,  we  will  consider  the  system  described  by  {(4),  (6)}. 
From  the  definition  for  global  asymptotic  stability  as 
stated  in  the  previous  section,  it  is  necessary  that 

Q{A  •  6[x](n)}  7^  0,  for  any  x(n)  7^  0.  (8) 

This  is  just  one  of  a  finite  set  of  conditions  that  is  re¬ 
quired  to  ensure  global  asymptotic  stability  of  a  FXP 
implementation  of  a  linearly  stable  system  [5]. 

In  the  case  of  rounding,  condition  (8)  is  violated  if 

t 

I A  ♦  <5[a:i/](n)|  <  — ,  for  any  1/  =z  , ,  ,  ^rn.  (9) 


(5[x](n)  z:.Q{A^x(n)},  (4) 

where  Q  is  a  vector-valued  quantization  nonlinearity  of 
the  form 

.  (5) 

\  }  / 

Here,  Q{xi,}  denotes  magnitude  truncation,  two’s  com¬ 
plement  truncation,  or  rounding. 

Eqn.  (2)  can  be  implemented  in  two  different  ways: 

x(n  +  1)  =  x(n)  -f  (5{A  •  <5[x](n)},  (6) 

or 

xr.Q{x(n)-h  A*(5[x](n)}.  (7) 

Eqn.  (6)  corresponds  to  quantization  after  multiplication 
while  (7)  corresponds  to  quantization  after  summation. 
In  contrast  to  (1),  for  (2),  it  is  not  clear  which  of  the 
two  quantization  schemes  in  (6)  and  (7)  is  preferable. 
We  will  therefore  consider  both  possibilities. 

Throughout  this  paper,  we  will  use  the  following  defini¬ 
tion  of  stability: 

Definition.  The  discrete-time  system  in  {(4)  (6)] 
or  {(4),  (7)}  is  globally  asymptotically  stable  if’  and 
only  if,  for  any  initial  condition  x(0),  the  state-  vec¬ 
tor  X  asymptotically  reaches  zero,  that  is,  x(n)  0 
for  n  — >  oo. 

Comment.  Since  the  FXP  systems  considered  are  in  fact 
finite  state  machines,  the  condition  x(n)  — ^  0  for  n  co 
may  be  restated  as  x(7V)  =  0  for  some  finite  N  [5]. 

Finally,  the  symbol  t  is  used  to  denote  the  quantization 
step. 


The  sampling  time  A  in  a  6-operator  formulated  imple¬ 
mentation  is  typically  very  small.  With  A  =  Di  and  (9), 
we  have 


|6[xv](n)|  <  — ,  for  any  (10) 

where  /  is  a  positive  integer. 

In  the  ca^e  of  magnitude  truncation,  (10)  takes  the  form 
|6[a.v](n)|  <  y,  for  any  (11) 

Accordingly,  for  two’s  complement  truncation,  we  have 


0  <  6[a:i.](n)  <  j,  for  any  (12) 

Conditions  (10-12)  describe  the  deadband,  in  terms 
of  6[x],  for  which  a  DC  limit  cycle  occurs.  Such  a  limit 
cycle  can  be  avoided  if  (10-12)  are  satisfied  by  the  zero 
vector  only.  In  the  case  of  rounding,  we  therefore  require 


or,  equivalently, 

^>1’  (13) 

which  is  impractical.  Similarly,  for  magnitude  and  two’s 
complement  truncation,  we  obtain 

,  1 

j  <=>  A  >  1,  (14) 

which  again  is  equally  impractical. 

Tills  result  is  summarized  in  the  following  theorem. 


Theorem  1.  A  necessary  condition  for  stability  of  the 
(5-operator  formulated  discrete-time  system  in  {(4),  (6)} 
is  A  >  0.5  for  rounding  and  A  >  1  for  truncation. 


In  the  case  of  the  remaining  two  quantization  schemes, 
the  inequalities  corresponding  to  (16)  are  given  as  fol¬ 
lows:  For  two’s  complement  truncation, 


The  above  theorem  shows  that  high-speed  6-operator 
formulated  implementations  that  possess  a  small  sam¬ 
pling  time  cannot  be  realized  limit  cycle  free  in  FXP 
format! 

A  second  necessary  condition  for  the  system  in  {(4),  (6)} 
can  be  obtained  by  noting  that 

6[x](n)  =  0  (15) 

can  occur  in  (4)  even  though  the  state  vector  x(n)  0. 

Therefore,  for  rounding,  no  nonzero  state  vector  x(n)‘ 
that  satisfies 


(h\ 

<A^ 

•  x(n)  <  + 

('] 

\i) 

[u 

0  <  ♦  x(^)  < 


x(n)  ^  0,  (17) 


and,  for  magnitude  truncation, 


(‘\ 

/A 

:  <  •  x(n)  <  + 

[J 

w 

x(n)7^0.  (18) 


A  similar  analysis  can  be  conducted  for  the  system 
in  {(4),  (7)}.  Since  (4)  is  common  to  both  realizations, 
(16-18)  are  still  valid  and  provide  conditions  under  which 
the  finite  difference  is  quantized  to  zero  and  a  DC  limit 
cycle  is  produced.  We  will  now  briefly  discuss  neces¬ 
sary  conditions  for  global  asymptotic  stability  obtained 
from  (7). 


may  be  allowed  to  exist.  Here,  the  inequality  has  to 
hold  elementwise.  Taking  norms  on  both  sides  of  (16) 
one  gets  an  algebraic  condition  on  the  system  matrix  A® 
that  always  support  DC  limit  cycles.  Eqn.  (16)  has  the 
bllowing  interesting  interpretations: 


1.  Each  of  the  resulting  m  inequalities  can  be  geomet¬ 
rically  interpreted  as  the  intersection  of  two  half 
spaces  in  These  intersections  are  symmetric 

about  the  origin  and  have  parallel  boundaries.  The 
normal  vector  to  the  boundaries  is  given  by  the 
particular  row  vector  of  A^.  Only  if  the  intersec¬ 
tion  of  all  such  m  half  spaces  contains  a  nonzero 
point  in  9^”^ ,  and  if  it  belongs  to  the  quantization 
lattice,  will  there  exist  a  nonzero  state  vector  that 
is  an  equilibrium  point  of  the  system. 


2.  Eqn.  (16)  can  also  be  interpreted  from  an  eigen¬ 
value/eigenvector  viewpoint.  In  high-speed  digi¬ 
tal  filters  where  the  sampling  frequency  is  typically 
much  higher  than  the  bandwidth  of  the  processed 
signal,  a  ^-operator  implementation’s  eigenvalues 
cluster  around  the  point  z  ^  \  [l].  The  correspond¬ 
ing  6-operator  implementation  for  large  sampling 
times  has  eigenvalues  clustered  around  zero.  How¬ 
ever,  as  the  sampling  time  becomes  small,  these 
eigenvalues  move  towards  the  eigenvalues  of  the 
underlying  continuous-time  system  [1].  In  other 
words,  for  large  sampling  times,  the  system  matrix 
will  be  ill-conditioned,  that  is,  vectors  x(n)  0 
exist  such  that  A^  *x(n)  is  close  to  the  zero  vector. 
According  to  (16),  this  is  likely  to  cause  a  DC  limit 
cycle.  For  small  sampling  times,  this  problem  may 
not  occur;  however,  in  this  case,  the  conditions  in 
Theorem  1  are  not  satisfied! 


For  rounding,  proceeding  as  in  (9),  we  have 

t 

|A  •  6[xv](n)|  <  -,  for  any  = 
and  therefore 

l^[®t/](n)l  <  for  any  (19) 


For  magnitude  truncation,  we  obtain 

0  <  6[xv](n)  <  y,  V6[xv]  >  0,  (20) 

and 

-~  <  6[xv](n)  <  0,  V6[a:;y]  <  0.  (21) 

In  the  case  of  two’s  complement  truncation,  the  condi¬ 
tion  for  a  DC  limit  cycle  is  given  by 

0  <  6[a;iy](n)  <  j,  Vz/  =  1,  . .  .  ,  m.  (22) 


With  A  —  /  •  £,  /  being  a  ‘small’  integer,  we  come  to  the 
same  conclusion  as  for  the  previously  considered  system: 

A  >  y  for  rounding; 

A  >  1  for  truncation. 

Therefore,  Theorem  1  also  holds  for  the  system  repre¬ 
sentation  in  {(4), (7)}. 


IV.  CONCLUSION 


Via  a  set  of  necessary  conditions  for  global  asymptotic 
stability,  it  has  been  shown  that  high-speed,  limit  cycle 
free  (5-operator  implementations  of  linear  discrete-time 
systems  cannot  be  realized.  This  is  due  to  the  tendency 
of  such  a  realization  to  produce  period  one  limit  cycles. 
This  situation  arises  from  small  values  in  the  finite  dif¬ 
ference  being  quantized  to  zero.  Hence,  convergence  to 
the  Vrong’  equilibrium  point  is  very  likely.  Conditions 
on  the  system  matrix  and  the  sampling  time  if  such  limit 
cycle  behavior  is  to  be  avoided  have  been  provided.  The 
results  indicate  that,  in  high-speed  applications,  these 
conditions  cannot  be  satisfied  with  conventional  quanti¬ 
zation  schemes. 
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Abstract — By  developing  the  ^-operator  ana¬ 
log  of  the  Roesser  model,  state-space  realiza¬ 
tion  of  two-  and  multi-dimensional  <5-systems 
is  investigated.  The  corresponding  notions 
of  gramians  and  balanced  realization  are  also 
defined.  It  is  shown  that,  discrete-time  sys¬ 
tem  implementation  using  this  model  can 
yield  superior  coefficient  sensitivity  proper¬ 
ties. 


I.  Introduction 

Judging  by  its  performance  in  the  one-dimensional  (1- 
D)  case  [2],  [5-6],  one  is  led  to  expect  superior  coeffi¬ 
cient  sensitivity  and  roundoff  noise  performance  with  6- 
operator  implementation  of  two-dimensional  (2-D)  and 
multi-dimensional  (m-D)  discrete-time  (DT)  systems. 
With  this  in  mind,  <5-operator  analog  of  the  g-operator 
Roesser  local  state-space  (s.s.)  model  [12]  is  developed. 
We  also  propose  the  notions  of  gramians  and  balanced 
(BL)  realization.  As  expected,  realization  using  this 
model  can  provide  superior  coefficient  sensitivity  prop¬ 
erties. 


B.  Preliminaries 

Consider  a  linear,  shift-invariant,  strictly  causal,  p-input 
^-output  2-p  DT  system.  Its  n^h-nt^v  Roesser  local  s.s, 
model  {A,  jB,C,D}  takes  the  form  [12]: 


=  [A] 


=  [c] 


’x'‘(t,i)] 


(2.1) 


where  u  €  €  S"* ,  x”  €  S?"",  and  y  €  3?’.  x'* 

and  x"  are  the  h.p.  and  v.p.  local  state  vectors.  Take 
n  =  rifi  +  riy.  Also, 


9fe[x](t,i)  =  x(t+  l,i);  g„(x](t,i)  =  x(i,j  +  1).  (2.2) 


In  what  follows,  we  use  matrix  partitioning  that  con 


form  to  A  = 


i(l)  A(2) 
A(3)  ^(4) 


B  = 


B(i) 

S(2) 


and  C  = 


.  The  corresponding  2-D  characteristic  equa- 
tion  and  transfer  function  are 


II.  Nomenclature  and  Preliminaries 
A.  Nomenclature 

5l:  Reals;  Complex  numbers;  Matri¬ 

ces  of  size  q  X  p  over  9^  and  In:  n  x  n  unit  ma¬ 
trix;  A*,  trace[A],  ||A|(p:  Conjugate  transpose,  trace, 
and  Frdbenius  norm  of  matrix  A;  Unit  vector  in 

with  1  on  the  i-th  row;  6 

‘'•jf  =  EL,  EL, 

For  q~  and  6-systems,  we  use  the  indeterminates  z  and 
c,  respectively.  For  1-D  systems,  6  =  (q  ~  l)/r  c  — 
(z  ~  l)/r,  where  r  is  a  positive  real  constant,  usually 
the  sampling  time.  Let  u]  -  {{chyCy)  G  :  |c/,  -h 
<  I/'T/i.ICt;  -f  l/Ty\  <  1}.  is  its  boundary. 

The  corresponding  (/-system  regions  are  denoted  with  the 
subscript  q, 

K.l^.  and  P.H.B.  gratefully  acknowledge  the  support  received 
from  the  Office  of  Naval  Research  (ONR  )  through  the  grants 
N00014-94- 1-0454  and  N00014-94-1-0387,  respectively. 


det[/^  A]  —  0  A]; 

Hizh,z^)  =  C(h  -  A)-^B  +  D,  ^  ’ 

where  Zh,Zy  h=  z^U^  ®  z„/n„  €  With  no 

nonessential  singularities  of  the  second  kind  (NSSK)  on 
r/,  {A,  B,  C,  b)  is  BIBO  stable  iff  [3] 

dei[h-  A\^Q,'i{zH,Zv)  £U].  (2.4) 

III.  2-D  6-Model 

A.  Local  s.s.  model 

Analogous  to  the  1-D  case,  define  6^1]  and  6v[  ]  as 

•^/ilxKi.i)  =  ^  ~  ^(^i)  _  <}h[x.]iij)  -  x(i,j) 

Th 

j)  =  +  1)  -  x(t,  j)  _  f/vlxKi.j)  -  x(z,j) 

Ty 

(3.1) 
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Here  r/^  and  Tv  are  positive  r^eal  constants  denoting  the  where 
‘sampling  times’  along  h.'p,  and  v.p.  directions,  respec¬ 
tively.  Note  that 


(0.0)<(fc,fc)<(.-,i)(^' 


B(l) 

0 


9/i  =  1  =  1  +  t„Sv, 

>  i 


and  letting  r  =  0  €  3?"’^", 


9t.[x”](t,i)  J 


—  I  +r  ® 


1 


^  (ij) 


Using  (3.3)  in  (2.1),  we  get 


y(».i)  =  [c] 


where  A  = 


A(i)  a(2) 
A(3)  a('‘) 


,  B  = 


5(1) 

B(2) 


(3.2) 


5(2)  J  )u(/i,*). 

Let  Ic  =  CftjTnfc  0  Cv/n„  €  Then,  the  2-D  6-' 

model’s  characteristic  equation  and  transfer  function  are 


det[/c  -  A]  =  — — det[/;,  -  A]U_c; 

denn  (3.9) 

H{c)i,Cv)  =  Zv)\k—^Cj 


(3.3) 


(3.4) 


where 


Zh  =  ^  +  Ti^Ck]  zt;  =  1  4- r^;Ct;.  (3.10) 


From  now  on,  the  variable  transformation  in  (3.10)  is 
denoted  by  c  — »-  z  or  z  — ►  c  whatever  is  appropriate. 
Nonsingular  transformations  of  the  type 


,  and  C  = 


x'‘(i,i) 


=  [T] 


x'‘(t,j) 

x'’(i,i) 


(3.11) 


In  addition,  we  need  to  perform 

9^[x'‘]  =  x'‘+r/,-(S/i[x'‘];  9v[x’']  =  x” Ar,, •6„[x’'].  (3.5) 

Here, 

A  =  In+rA;  B  ^  rB]  C  ^  C\  D  ^  D.  (3.6) 


where  T  =  0  yield  the  equivalent  2-D  s.s. 

realization  {A,  B,C,  D}.  Here, 

A  =  TAT~'^l  B  =  TB-,  C  =  CT-'^-,  D  =  D.  (3.12) 

With  no  NSSK  on  T/,  {A,  B,C,  D}  is  BIBO  stable  iff 

det[Ic- A]jiQ,y(ck,Cv)  eul-  (3.13)^ 


B.  Properties  of  ike  2-D  6-model 

Most  of  the  following  properties  may  be  derived  in  a  man¬ 
ner  that  is  exactly  analogous  to  that  in  [12]. 

The  transition  matrix  of  the  6-model,  may  be 

recursively  computed  from 


{  0,(i,i)  =  (0,0); 

[t-Tlh  0  (*1  j)  ~  (0l  ®)> 


Inf,  0 

0  0 

4-  r 

’Ad) 

0 

A(2)' 

0 

'o  0 

0 

0 

0  In, 

4-  r 

aO) 

A(^), 

.(l;)  =  (i.o); 
.(Li)  =  (o,  1); 

[  A'-°A'-i--’  +  A'>dA''-’-i,  elsewhere. 


The  general  response  of  the  6-model  is 


(3.7) 


C.  Gramians 

The  gramians  of  2-D  (^-systems  are  taken  to  be  natural 
extensions  of  the  integral  expressions  of  their  1-D  coun¬ 
terparts  [11].  We  will  adopt  a  similar  approach.  In  what 
follows,  we  consider  the  1-D  (or  2-D)  stable  6-system 
{A,H,C,  D}  with  gramians  P  and  Q,  The  correspond¬ 
ing  ^-system  is  {A,  S,C,  D}  with  gramians  F  and  Q. 

1-D  case.  The  gramians  are  defined  in  [10]. 

Definition  3.1.  [10].  The  gramians  are  the  solutions  to 
the  Lyapunov  equations 

AP-fPA^  -hr  - APA^  =  -BB*; 

A*Q  -h  QA  -h  r  -  A*QA  =  ~C*C. 

Lemma  3.1.  The  gramians  satisfy  the  integral  expres¬ 
sions 


x'‘(f,;)' 

.x''(t',i) 


x^(0,  k) 
0 


k=o 


+  ^  A— 

/i  =  0 


0 

[x^(/i,0) 


-f  f(u), 


(3.8) 


dc  1 

__  j  Q =  _ 

1  +  rc  ZTTJ 


G*G 


dc 

1  “h  rc  ’ 


wliere  P(c)  =  (c/n  -  A)  ^B  and  G(c)  =  C(c/n  -  A) 
Moreover,  P  =  rP  and  Q  =  Q/r. 


8G6 


Proof.  Substitute  A  =  In  +  rA,  B  :=  C  ^  C,  and 
D  =  D  [10]  in  the  equations  in  Definition  3.1,  and  note 
the  integral  expressions  for  P  and  Q  in  [8].  ■ 

2-D  case.  With  Lemma  3.1  in  mind,  we  have 
Definition  3.2.  The  gramians  are 


P  = 


dcy 

1  +  rkCh,  1  +  TyCv  ' 

dc/j  dcy 

f  1  4-  TyCv  ’ 


■p(l)  p(2)] 


\qW  q(2) 


where  P  =  [-(3^  J  and  Q  =  $4;  J 

Also,  F(ck,Cv)  =  (Ic  -  A)~^B  =  and 

<5(ca,Cv)  =  C{Ic  -  A)“i  =  [gi,...,g„]. 


Remarks. 

1.  Note  that,  {Ic  —  A)’'^(c-*k  =  {Iz  —  A)™^r,  and 


P\c—»z  —  F\  G\c^%  =z  G  ’  T.  (3.14) 


when  {A,  P,C,  D}  is  locally  reachable  and  observable 
P(l),  PC-*),  Q(*),  and  QW  are  each  p.d. 

Separable  systems.  A  separable  (in  denominator)  2-D 
g-system  will  have  =  0  (and/or  A^^)  =  0)  and  all 
off-diagonal  suhmatrices  of  P  and  Q  are  zero.  The  di¬ 
agonal  submatrices  may  be  computed  through  two  pairs 
of  Lyapunov  equations  [11],  Clearly,  a  separable  2-D’  g- 
system  yields  a  separable  2-D  6-system. 

Theorem  3.6.  Let  {A,  B,  C,  D}  be  separable  with  A^^^  = 
0.  Then,  P(2)  =  Q(2)  =  0  and  P(3)  =  Q(3)  =  0,  and 

A(i)p(i)+p(i)^(i)*  +r/,A(i)p(i)A(i)* 

a(i)*q(i)  -I-  g(i)  a(i)  -f-  TA  a(i)*  a(i) 

=  -[c(i)  P('‘)a(3)]*  [c(i)  P(4)a(3)] /r„; 

A(4)p(4)  +  p(4)^(4)-  ^  ^„A('‘)p(‘')  aW* 

=  -[s(2)  A(3)5(1)]  [p(2)  ^(3)5(1) 

a(^)*  +  (?(“)  A^-^)  +  r„  A^-*)*  A^^) 


2.  Definition  3.2  is  completely  analogous  to  the  1-D  and 
2-D  <y-systems  [7],  [llj. 

Lemma  3.2.  P  =  r^TyP  and  Q  ==  T}^Tvr~^  Qt~^  . 

Proof.  Consider  P  in  Definition  3.2.  Use  c  — +  z,  (3.14), 
and  definition  of  gramians  for  2-D  g-systems  [11].  ■ 

The  following  are  in  complete  analogy  with  2-D  q- 
systems. 

Lemma  3.3.  The  gramians  may  be  represented  as 


P  = 


1 


1=0  J=0 


CO  oo 

EE  A*d*C*CA‘^‘  r, 

t=0  j  =  o 


0,  and,  for  (ij)  >  (0,0), 


Mij  z=  A‘-i>>r 


-I-  A‘d-ir 


0 

P(2) 


where,  for  (z,  j)  =  (0,  0),  Mij 
0 

Lemma  3.^.  Let  {A,  P,  C,  D]  with  gramians  P  and  Q  be 
an  equivalent  system  as  in  (3.10-11).  Then,  P  ~  TFT* 
and  Q  —  T~^  QT~^ .  Moreover,  the  eigenvalues  of  PQ 
and  PQ  are  invariant. 

Definition  3.3.  {A,P,C,  D}  is  said  to  be  balanced  if 

p(U  rz  Q(U  =  =  diag{cr^^\(7^^\ , . .  ,cr^y }  and 

P(4)  =  Q(4)  iE(4) 

If  the  diagonal  submatrices  of  P  and  Q  are  each  posi- 
tive  definite  (p.d.),  a  BL  realization  may  be  obtained  [4]. 
Regarding  this,  we  have 

Lemma  3.5.  Local  reachability  and  observability  of 
{A,  P,  C,  D}  and  {  A,  P,  C,  £)}  are  equivalent.  Moreover, 


Here,  RW*  RW  ^  and  =  Tf,r.PW, 


IV.  Coefficient  Sensitivity 

By  generalizing  a  certain  sensitivity  measure,  Lutz  and 
Hakimi  [9]  have  addressed  sensitivity  minimization  of 
MIMO  1-D  CT  systems.  The  SISO  2-D  g-operator  case 
is  in  [7].  In  what  follows,  we  study  the  coefficient  sen¬ 
sitivity  of  the  2-D  6-modeI  in  section  III.  We  follow  a 
more  direct  approach  using  Kronecker  product  formula¬ 
tion  and,  hence,  the  results  are  applicable  to  the  more 
general  MIMO  case.  Using  [1],  we  may  show 

,  Cv)  =  [In  C]  '  U  nxn  *  [In  0  P]  (4.1) 

•5b(c/i,Cv)  —  [/n  O  C]  ■  Unxp  (4.2) 

•^0(^/1,  Cv)  —  bJqy^n  '  [In  ^  P]  (^*3) 

Soich^Cv)  =  (4.4) 

Lemma  The  quantities  in  (4. 1-4.4)  are  given  as 


5  a  = 


Sr  = 


5c  = 


gi 


'■sS" 


Sn 

f(i)* 


f(9)’ 


f  f ;  •  •  ■  f : 


„(?>) 

&n 

cDY 
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■^1,1  • 

*  *  R^lyP 

.■^9,1  • 

*  * 

Here,  denotes  a  X  p)  null  matrix  except  its  j-th 

row  which  is  f  *  and'  denotes  a  X  p)  null  matrix 
except  its  ^‘-th  column  which  is  g;. 

Proof.  This  rr^ay  be  shown  through  the  results  in  [l]  and 
simple  yet  tedious  algebraic  manipulations.  ■ 

Corollary  .^.5.  The  quantities  5^^,  Ssj  Scj  and  S^)  of  the 
^-model  and  the  quantities  5^,  5^,  S^y  and  Sj^  of  the 
corresponding  9- model  are  related  by  Sy^lc— z  = 

|c— ‘^clc— ►E  —  and  *Sj9|c— 

where  T=  Thln^q 

Proof.  Apply  (3.14)  to  Lemma  4.1.  ■ 

To  proceed  further,  we  utilize  the  following 
Definition  Let  H{ch,Cv)  be  a  bivariate  matrix¬ 

valued  function  that  is  analytic  on  .  Then, 


1 


(p  ||//(c/i ,  Ct;)  |c— 11^ 

Jrf 


dzf^  dzy 


Remark.  This  norm  is  extensively  utilized  in  related  work 
[7]  due  mainly  to  the  fact  that  it  leads  to  tractable  re¬ 
sults.  This,  and  our  desire  to  make  a  comparison  with 
the  corresponding  ^-model,  are  the  primary  reasons  for 
its  use  here. 

We  now  define  the  absolute  sensitivity  measure 


M=  115^11?  +  hsBWl  +  hscWl  +  -II^dIII-  (4.5) 

P  Q  PQ 


Remarks. 

1.  The  use  of  different  norms ‘is  for  mathematical  feasi¬ 
bility  and  tractability  [7],  [5]. 

2.  The  weights  associated  with  each  term  in  (4.5)  may 
be  thought  of  as  averaging  factors  per  inpui/ouipui. 

3.  Due  to  (3.5),  M  should  contain  ||5r/j  ||  and  ||5ry||. 
However,  we  assume  that  r/^  and  Ty  are  selected  such 
that  each  possess  exact  binary  representations.  Hence, 
these  additional  terms  are  neglected. 

Using  an  argument  similar  to  that  in  [7],  one  may  show 
the  following: 


||‘S'a||i  <■  trace[P]  •  trace[r(5r]  (4.6) 

il-5c||2  =  P  •  tracofrQr]  (4.7) 

IIScll?  =  <7  •  trace[P]  (4.8) 

11-50112=  P?  (4.9) 

Combining  (4.5)  with  (4.G-9),  we  get 

M  <  M  =  (trace[P]  4*  l)(trace[r(5'r]  -f  1).  (^*10) 


It  is  customary  to  perform  a  minimization  of  Af.  Hence, 
one  attempts  to  characterize  those  { A,  H,  C,  D}  that  are 
‘bound  optimal’  with  respect  to  M.  Analogous  to  2-r^ 
^-systems  case  [7],  one  may  for  instance  show  that  a  Blf 
realization  (modulo  an  orthogonal  nonsingular  transfor¬ 
mation)  is  ‘bound  optimal’  with  respect  to  M. 

Compared  to  a  gr-system,  its  6-system  counterpart 
yields  a  smaller  M  whenever  trace[(5]  >  trace[rQr],  that 
is, 

(1  -  rl)  •  trace[Q^^^]  +  (1  ~  r^)  •  trace[(5(^)]  >  0.  (4.11) 

Note  that,  with  the  local  reachability  and  observability 
assumption  of  {A,  H,  C,  D},  p.d.  of  and  (and 

hence  of  and  are  guaranteed.  Thus,  (4.11)  is 

satisfied  if  <  1  and  Ty  <  1. 

VII.  Conclusion 

We  have  developed  the  6-operator  analog  of  the 
Roesser  local  s.s.  model.  Notions  of  gramians  and  BL 
realization  are  also  proposed.  As  is  expected,  under  mild 
conditions,  this  model  offers  superior  coefficient  sensitiv¬ 
ity  properties. 
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ABSTRACT 

Delta-operator  based  implementations  can  avoid  the  numer¬ 
ical  ill-conditioning  usually  associated  with  high  speed  shift- 
operator  based  implementations  of  discrete-time  systems. 
Moreover,  it  provides  a  unified  methodology  for  tackling 
both  continuous-  and  discrete- time  systems.  In  particular, 

Kias  been  shown  that,  delta-operator  based  balanced  re- 
ations  can  offer  superior  coefficient  sensitivity  properties 
under  fixed-point  arithmetic.  In  this  work,  we  address  com¬ 
putation  of  balanced  realizations.  For  this  purpose,  given  a 
discrete-time  system,  the  relationship  between  its  shift-  and 
delta-operator  formulated  balanced  realizations  is  presented. 

L  INTRODUCTION 

Current  interest  in  delta-systems  (5-systems)  is  due  mainly 
to  two  reasons:  (a)  5-systems  provide  superior  roundoff  noise 
[1-2]  and  coefficient  sensitivity  [3-4]  properties,  and  (b)  5- 
operator  makes  it  possible  to  treat  both  continuous-time 
(CT)  and  discrete-time  (DT)  systems  in  a  unified  man¬ 
ner  [5].  Recent  work  on  5-operator  based  implementation 
of  two-dimensional  (2-D)  DT  systems  contain  the  counter¬ 
part  to  the  shift-operator  (g-operator)  based  Roesser  lo¬ 
cal  state-space  (s.s.)  model  [6].  Balanced  (BL)  realiza¬ 
tion  of  such  models  and  coefficient  sensitivity  properties 
were  also  investigated.  Indeed,  given  a  2-D  DT  system, 
under  fixed-point  (FXP)  arithmetic  (and  mild  conditions), 
Roesser  5-model  was  shown  to  be  superior  to  the  Roesser 
g-model.  In  this  paper,  we  reveal  the  relationship  be¬ 
tween  BL  realizations  of  Roesser  5-  and  g- models.  This 
makes  it  possible  to  use  techniques  available  for  compu¬ 
tation  of  g-BL  models  for  computation  of  5-BL  models. 
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IL  NOMENCLATURE  AND  PRELIMINARIES 
2.1.  Nomenclature 

9?,  O',  and  K  denote  the  reals,  complex  numbers,  and  non¬ 
negative  integers,  respectively.  and  are  the  sets 

of  matrices  of  size  q  x  p  over  and  9,  respectively. 

In  is  the  unit  matrix  of  size  n  x  n;  0  is  the  null  matrix  of 
size  q  X  p.  A*  and  denote  the  complex  conjugate  trans¬ 
pose  and  transpose  of  matrix  A  E  trace[A]  and  A, [A] 

denote  its  trace  and  z-th  eigenvalue.  ||A||f  is  its  Frobenius 
norm. 

In  the  1-D  case,  corresponding  q-  and  5-systems  are  re¬ 
lated  by  5  =  (g  —  1)/A  c  =  (z  —  1)/A.  Here,  A  is  a 
positive  real  constant  (usually  the  sampling  time).  For  2-D 
systems,  subscripts  h  and  v  denote  horizontally  propagating 
(h.p.)  and  vertically  propagating  (v.p.)  subsystems  of  the 
corresponding  Roesser  local  s.s.  models.  Uh  and  denote 
the  sizes  of  these  h.p.  and  v.p.  subsystems.  We  use  n  to 
denote  n  =  Uh  Ah  and  A^  are  positive  real  constants 

denoting  ‘sampling  times’  along  h.p.  and  v.p.  directions. 

We  use^  to  denote  Ahluh^^vln^  E  Also,  and 

Ic  denote  Zhluf,  0  Zyln,  E  and  Chlnn  0  <^vln,  G 

respectively. 

Corresponding  2-D  g-  and  5-systems  are  related  by  Sh  = 
{qh-l)/Ah  <==>  Ch  =  {zh-l)/Ah  and  Sy  =  (g^-l)/A^ 

Cv  =  i^v  —  1)/At;.  We  use  subscripts  5  and  g  to  differentiate 
between  corresponding  5-  and  g-systems;  for  example,  s.s. 
realization  of  a  given  DT  system  is  either  {As,  Bs.C^,  Ds} 
if  implemented  based  on  5-operator  or  {Aq,  Bg,Cq,  Dg}  if 
implemented  based  on  g-operator.  The  following  notation 
is  also  used:  If(c/,, c^)|c-^2  =  H(ch,  and 

Cv  =(*  V-  0/^v 


Stability  studies  of  g-  and  5-systems  involve  the  follow- 


ing  regions:  =  {z  €  9  :  |2|  <  1};  =  {{zh,z„)  e  : 

\zh\  <  <  1};  =  {c  G  9  :  |c  +  1/A|  <  1/A};  Z/|  = 

{(cA,c„)  G  92  ;  |cft  +  1/Aftl  <  1/Aft,  |c„  +  1/A„|  <  1}.  The 
corresponding  distinguished  boundaries  are  denoted  with  let¬ 
ter  T,\UAJT.  is  denoted  hy  U.,  A  g-system  polynomial  with 
all  its  roots  in  Uq  (for  the  1-D  case)  or  (for  the  2-D  case)  is 
said  to  be  stable.  The  corresponding  regions  for  a  ^-system 
polynomial  are  Us  (for  the  1-D  case)  and  Uj  (for  the  2-D 
case),  respectively. 

2.2.  Preliminaries 

First,  we  provide  a  brief  overview  of  relevant  material, 

Roesser  q-model.  The  2-D  dynamical  system  under  consider¬ 
ation  is  assumed  to  be  linear,  shift-invariant,  strictly  causal, 
and  modeled  by  a  set  of  first-order  vector  difference  equa¬ 
tions  over  3?.  Given  such  a  p-input  and  g-output  system,  its 
Tihh-nyV  Roesser  local  s.s.  model  {Ag,  Bq^Cg,  Dg}  is  of  the 
form  [7] 


r4‘’  4”1 

[5^1 

x''(«,i) 

(2.1) 


where  u  G  9?^,  x'*  G  3?”'*,  x"  G  3?””,  and  y  G  3^'^.  Also, 

^(1)  g  s)jn,xn._  ^(4)  g  g  ^2 


CsjDs}  has  been  proposed  [6]: 


[4"  4”1 

x'‘(ui)  , 

Af\ 

x"(*,i) 

[bP\ 

y{h  j) 


ic''>  cl’’ 1 +iD.w..i) 


where 


(2.6) 


Af  =  r^(A,-4);  Bs  =  C'B^;  Q  =  C,;  Ds  =  D,.  (2.7) 

Here,  ^  =  [Aft/„,  0  A„/„J  G  37"^".  Note  that,  as  opposed 
to  its  corresponding  Roesser  g-model,  here,  one  must  also 
perform  the  following  computations: 

gft[x'*]  =  x'*  +  Aft  ■4[x'‘];  9„[x*'](i,i)  =  x"  +  A„  •  ^^x"]. 

(2.8) 

In  [6],  several  properties  of  this  Roesser  6-model  (such  as, 
general  response  equation,  transition  matrix,  characteristic 
equation,  transfer  function)  are  elaborated.  Also,  it  is  easy  to 
see  that,  as  for  the  g-model,  2-D  equivalent  transformations 
of  the  type 


.x'' (*,/). 


X'(i) 

0 


0 

21(4) 


x^(j,i) 


=  [T] 


XihJ). 


where  G  and  G 

yield  an  equivalent  2~D  s.s.  realization 

where 


. 

^  are  nonsingulU^ 

{As,  Bs.Cs.Ds}, 


1h[x]ihj)  =  x(i+l,/)  and  5„[x:](*>i)  =  x(z,/+l).  (2.2)  As  =  TAsT  Bs  =  TBs;  Cs  =  CsT-'^;  Ds  =  Ds.  (2.10) 


Usually,  X*  and  x''  are  called  the  h.p.  and  v.p.  local  state 
vectors  of  {Ag,Bg,Cq,Dq}.  With  no  nonessential  singulari¬ 
ties  of  the  second  kind  on  Tg,  for  BIBO  stability,  one  requires 
[8] 


det[4  -Ag]  /  0,  'i{zh,z„)  ^u\. 


(2.3) 


Roesser  S-model.  To  exploit  the  superior  finite  wordlength 
properties  of  6-operator  implementations,  analogous  to  the 
1-D  case,  in  [6],  the  following  operators  are  defined: 

^^[x](i,j)  =  ^(^'  +  io')-x(ui)  _  <?/.W(i,i)-x(Ui) 


Aft  Aft 

x(Ui+  1)  -x(z,j)  _  9„[x](i,  j)  -  x{i,j) 


A„ 


A„ 


(2.4) 

where  Ah  and  are  two  positive  real  numbers.  Hence,  the 
following  relationships  are  applicable: 


f:  ^  gv  f 

Oh  =  — ,  6„  == 


(2.5) 


Ah  ’  A, 

Using  (2.4-5)  in  (2.1),  the  following  Roesser  6-model  {As,Bs, 


Also,  {As,Bs,Cs,Ds}  and  {As,Bs,Cs,  Ds}  have  the  same 
transfer  function.  With  no  nonessential  singularities  of  the 
second  kind  on  T/,  for  BIBO  stability,  one  requires 


det[/c  “  As]  ^  0,  V(c/j,  Ct,)  e  u\. 


(2.11) 


IIL  GRAMIANS  AND  BL  REALIZATIONS 
3.1.  Gramians 

For  the  Roesser  g-model,  gramians  are  taken  to  be  natu¬ 
ral  extensions  of  the  integral  expressions  of  their  1-D  coun¬ 
terparts  [9-10].  The  work  in  [6]  adopts  a  similar  approach 
in  proposing  gramians  for  the  6-operator  case  as  defined  in 
[5].  In  what  follows,  [As,  Bs,Cs ,  Ds]  (with  gramians  Ps  and 
Qs)  and  [Ag,  Bq,Cq,  Dg]  (with  gramians  Pg  and  Qg)  de¬ 
note  a  given  stable  2-D  DT  system’s  6-  and  g-operator  based 
Roesser  models,  respectively. 

Definition  3.1.  [9-10]. 

1.  Gramians  of  {Ag,  Bg,Cg,  Dg]  are 


(25ri)2  /  y 


r-i  rp*  dZy 

pgtq  —  —  : 


dzh  dzy 

Zh  Zy 


where  Fj(zfe,z„)  =  (4  -  ^4,)  ^Bg  and  Gg{zh,z„)  = 
Cg{h-Ag)-\ 

2.  Gramians  of  { A j ,  5^ ,  Cj ,  £>« }  are 


_ _ . 

'7-2  *  l-f  AhCft  1  + A„c„’ 


dc/i 


dCy 


+  A/jC/i  1  +  AvCt,  ’ 


where  F5{ch,Crj)  =  {h  -  and  Gs{ch,Cy)  — 

Cs{Ic-Ae)-\ 

Lemma  3.1.  [6].  The  relationship  between  the  above  grami¬ 
ans  are 


Ps  = 


^p. 
AaA„  ” 


Qb 


1 


A/i  Av 


With  appropriate  partitions  incorporated,  this  is  equivalent 
to 


*  p(D 

p(2)  ' 

^ b 

1 

c7 

[p(3) 

Pj'l 

AhA„ 

.pP  pP\ 

\qP 

'^qP  qP  1 

l-Qf 

kQP\ 

For  convenience,  we  use  the  following  notation: 

{A,B,C,D}  -^{A,B,C,D):  Here,  A  =  TylT-^  B  =  TB, 
C  —  CT“\  and  D  =  D,  where  T  is  of  type  (2.9-10). 

{Aq,Bg,Cg,Dg]^-^{As,Bs,C^,Ds}:  This  is  the  corre¬ 
sponding  ^-system  obtained  by  applying  (2.7). 

{AsyBs,Cs,Ds}^-^{Ag,Bg,Cg,Dg}:  This  is  the  corre¬ 
sponding  g-system  obtained  by  applying  (2.7). 


Moreover,  we  use  the  following: 

{AgB,BgB,CgB,DgB}--  BL  reaUzatioD  of  {Ag,Bg,Cg,Dg} 

T 

obtained  via  {Ag,  Bg^Cg,  Dg}  — 

{AsbiBsb^Csb^  B^b}'  BL  realization  of  {Asy  B^yCsj  Ds} 
obtained  via  {A^ ^  BsyC^ ,  Ds}  -^{A^b  ,  B^b  ?  ,  Dsb } • 

{A3B2q,B^B2q^CsB2q,D^B2q}'^  g-system  obtained  via 

{A^B^BsB<>CsBi  Bsb]  - ^^{A5B2q,  B5B2q^CsB2qi  DsB2q}^ 

{AgB2b,BgB26,CgB26^DgB2s}'•  ^-system  obtained  via 

{AgB  ,  BgS  yCgBy  D  gB  }  ^{AgB26  ,  BgB2b )  CgB26  ,  BgB2b  }  • 

Lemma  3.3.  The  following  relationships  are  true: 

,  B  q  j  Cq  ,  Dq  )■  '^\^A^B2q  j  B  ^B2q  j  G^B2q  j  B^B2q  }  j 

'^9 

{A$y  B^yCsyDs}  - ^{AgB26^  BqB26iCqB26i  BgB26}^ 


Lemma  3.2,  [6].  The  realization  {A^y  BsyCsyDs}  obtained 
with  a  nonsingular  transformation  of  the  type  in  (2.9-10) 
^^Ids  the  gramians  =  TPsT*  and  Qs  =  T~^  QsT~^, 
^H^envalues  of  PsQs  are  invariant  under  such  a  transforma¬ 
tion.  The  situation  regarding  Roesser  g- model  is  completely 
equivalent. 

Definition  3.2.  [10].  Roesser  (5-model  {AsyBsyCsyDs}  is 
said  to  be  balanced  (BL)  if 

Pf  ^  =  Q^P  =  =  dmg{aP, a\[l  }; 

=  QP  =  =  diag{<TW, . . .  ,<rP^ }. 


n-l 


Proof.  Note  that,  AiB2q  =  In  A  ^Asb  =  4  +  ^TsAiTf  - 
In  +  mr\Ag  -  I„)Tp  =  TiAgT,-\  since  =  T^. 

The  remainder  may  proven  in  a  similar  manner.  H 

Lemma  3.4.  The  following  relationships  are  true: 

{A6B2q,  BsB2qyC6B2q,B6B2q} 

.-1/2 

- ^  {AgB,  BgB  y  CqB  y  D gB  }  5 

{AqB2b,BgB2eyCqB28yDqB26] 

^1/2 

- ^{AsByBsByCsB,  B$b}’ 


We  refer  to  i  =  1,  ...,n/i,  and  j  =  1,..,,^^, 

as  the  Hankel  singular  values  of  h.p.  and  v.p.  subsystems, 
repectively.  The  situation  regarding  Roesser  g-model  is  com¬ 
pletely  equivalent. 

3.2.  Computation  of  BL  Realizations 

Computation  of  gramians  and  obtaining  BL  realizations  for 
g-systems  have  been  investigated  quite  thoroughly.  In  the 
1-D  and  2-D  separable  cases,  one  may  solve  Lyapunov  equa¬ 
tions  and  use  Laub’s  algorithm  [10-11].  In  the  2-D  non- 
separable  case,  this  computation  is  not  that  easy;  however, 
several  techniques  have  been  developed  [10],  [12]. 

In  this  section,  we  provide  the  relationship  between  BL 
izations  of  corresponding  (5-  and  g-models.  This  allows  all 
liable  techniques  for  gramian  computation  of  g-systems  to 
be  utilized  for  6-systems  as  well.  To  the  authors’  knowledge, 
such  a  relationship  is  not  available  even  for  the  1-D  case. 
Although  we  concentrate  on  the  2-D  case,  a  similar  argument 
may  be  developed  for  the  1-D  case. 


Proof.  Note  that,  {A^b yB^B yC^B,  Bsb)  has  following 
gramians: 


PbB  = 


Hence,  from  Lemma  3.1,  {AeB2qy  BsB2q,CeB2q,  D6B2q]  has 
the  following  gramians: 


\^P 

p(2)  ■ 

^bB 

n _ 

■  y'(l) 

o 

. 1 

^(■4) 
^b  J 

Oi 

tc 

1 

L 

w 

P6B2q  =  A/jAv 
QbB2q  — 


p(a)  ^(*4) 


At;  y((I) 

A;.  ^<5 


^SB 


Ay^b  J 


To  get  {AgByBgByCgByDgB},  wc  ueed  to  simultaneously 
diagonalize  the  two  pairs  {A/^ (A^/A/i)S^^^}  and 
{AaA„E^^\  (Aa/A„)Ej'‘^}.  By  applying  Laub’s  algorithm, 
we  get  these  two  transformations  to  be  A^  '  and 


_ 1/2 

Av  ^  In^.  This  proves  the  first  part.  The  remainder  fol¬ 
lows  in  a  similar  manner.  ■ 

Corollary  3.5.  The  systems  and 

{AsB^B^BiCsByD^B]  are  related  as  follows: 

Asb  =  r^'\A,B  -  BiB  = 

CiB  =  CgB^~^^^]  DsB  =  DgB- 


Proof.  Note  that,  from  Lemma  3.4,  Asb  =  ^~^{AsB2g-In)  = 
r\i^'‘^AgBi-^l^  -  In)  =  i-^l^{AgB  -  The  rest 


follows  in  a  similar  manner. 


IV.  EXAMPLE 

We  now  consider  a  stable  3h-3v  2-D  separable  digital  filter. 
4.1.  Computations 

Numerical  values  are  displayed  via  FORMAT  SHORT  E  of 
MATLAB  [13]  which  was  used  for  all  computations.  Note 
that,  since  system  being  considered  has  A^^  =  0  (instead  of 

Aq  ^  ==  0),  relevant  equations  must  be  appropriately  modi¬ 
fied. 


Given  q-model  {Aq,  Bq^Cg,  Dq} . 


0 

0 

3.8315e 


01 


-6.8280e  -02 
-2.8100e  -02 
1.2445e  +  00 


0 

0 

3.8238e  -  01 


l.OOOOe  +  00 
0 

-1.3861e  +  00 

6.1900e  -  02 
3.9560e  -  02 
-5.7092e  -  01 

l.OOOOe +  00 
0 

1.3818e  +  00 


0 

l.OOOOe  +  00 
1.9067e  +  00 

6.5400e-03 
-2.2480e  -  02 
2.0587e+00 

0 

l.OOOOe  +  00 
1.9025e  +  00 


bW  =  [0  0  if; 

5f>  =  [0  O  if; 

=  [1.1410e-  02  -5.4000e-  03  1.9560e-02]; 

Cf)  =  [1.1640e-  02  -5.4500e  -  03  1.9600e-02]; 
Dg  =  [9.4300e  -  03] . 


BL  q-model  {AgB,BgB,CgB,DgB}- 


aA)  _ 

^qB  — 

■  8.6478e  -  01 

2.6806e  -  01 

-3.4799e  - 

021 

-2.6806e  -  01 

5.8766e  -  01 

3.8402e-01 

_-3.4797e-02 

-3.8401e  -01 

4.5427e-01  _ 

II 

'4.2940e  -  01 

-3.3765e  -  01 

1.2689e  -  or 

3.3771e  -  01 

-2.6511e-01 

1.0134e  -01 

) 

1.2732e  -01 

-9.7518e-02 

3.2423e  -  02 

Afi  =  0; 

.(4)_ 

— 

=  [6.3568e -02  4.9879e  -  02  1.8565e-02f; 
=  [6.5595e-01  5.1555e-01  1.9416e-01]; 


8.6486e  -  01 
-2.6760e  -  01 
-3.4952e  -  02 


2.6760e-01 
5.8692e  -  01 
-3.8661e-01 


-3.4949e  -  02 
3.8661e-01 
4.5071e  -  01 


=  [6.5590e  -  01  -5.1574e-01  1.9341e-01]; 


=  [6.3592e  -  02  -4.9875e  -  02  1.8540e  -  02] ; 

DgB  =  [9.4300e-03]. 


Corresponding  6-model  {Ae,Bs,Cs,Di}.  Let  us  select  Aft  = 
DeltOu  =  2.5000e  —  01.  Accordingly,  we  get 


-4.0000e  +  00 

4.0000e  +  00 

0 

aA)  _ 
^6  - 

0 

-4.0000e  +  00 

4.0000e  +  00 

_  1.5326e  +  00 

-5.5444e  +  00 

3.6268e  +  00 

aA)  _ 
- 

■-2.7312e-01 

2.4760e  -01 

2.6160e-02 

-1.1240e-01 

1.5824e-01 

-8.9920e  -  02 

4.9780e  +  00 

-2.2837e  +  00 

8.2348e  +  00 

II 

o 

■-4.0000e  +  00 

4.0000e  +  00 

0 

0 

-4.0000e  +  00 

4.0000e  +  00 

1.5295e  +  00 

-5.5272e  +  00 

3.6100e  +  00 

^^  =  [0  0  4f; 

5fU[0  0  4f; 

Cf)  =  [1.1410e -02  -5.4000e-  03  1.9560e-02]; 

Cf^  =  [1.1640e-  02  -5.4500e  -  03  1.9600e-02]; 
Di  =[9.4300e-03]. 


BL  6-model  Bjs,  Csb, 


AO- 

^iB  — 

aC)  - 

^SB  — 


-5.4089e-01  1.0722e  +  00 

-1.0722e  +  00  -1.6494e  +  00 
-1.3919e-  01  -1.5361e  +  00 


-1.3919e-01 
1.5361e  +  00 
-2.1829e  +  00 


1.7176e  +  00  -1.3506e+00  5.0755e  -  Ol' 
1.3508e  +  00  -1.0604e  +  00  4.0537e  -  01  ; 
5.0926e-  01  -3.9007e-  01  1.2969e-0lJ 


aC)  _ 
^6B  — 


-5.4054e  -  01 
-1.0704e  +00 
-1.3981e-01 


1.0704e  +  00 
-1.6523e  +  00 
-1.5464e  +  00 


-1.3980e-01 
1.5464e  +  00 
-2.1971e  +  00 


=  [1.2714e-  01  9.9759e  -  02  3.7129e-02f; 
=  [1.3119e  +  00  1.0311e  +  00  3.8833e-0lf; 
=  [1.3118e  +  00  -1.0315e  +  00  3.8682e-01]; 


Cj^B^  =  [1.2718e-  01  -9.9750e-  02  3.7080e-02]; 


Dsb  =  [9.4300e-03]. 


4.2.  Simulations 

Normalized  frequency  response  of  {Ag ,  Bg,Cg,  Dg}  is 
Hg  and  that  ol  {As,  3^,06,  Ds}  is  _  1)/ 

Aft,(e-’“^  -  1)/A„).  Frequency  responses  are  evaluated  on 
=  {(wi,<n2)  e  x  tt/N,  rii  -  [-N  :  1  :  N],  i  = 

1,2}  with  N  —  32.  For  comparison  purposes,  the  following 
measure  was  also  evaluated:  For  (2:1, 2:2)  = 

(ci,C2)  =  ((e>“>  -  l)/Aft,(e^'-=  -  1)/A,), 


Err 


f  maxg2 

^^(^1.^2)-  17(21,22) 

I  maxg2 

B{ci,C2)  —  H{ci,C2) 

for  5-models; 
for  (5-models. 


Here,  H  denotes  the  ‘ideal’  frequency  response  where  each 
coefficient  is  represented  in  ‘infinite’  precision;  H  denotes 
the  ‘actual’  frequency  response  where  each  coefficient  is  rep- 
Ipented  in  finite  precision. 

Fig,  (1)  shows  ^'max  versus  number  of  fractional  bits 
where  each  coefficient  is  represented  in  FXP  and  its  frac¬ 
tional  part  is  truncated  at  different  lengths;  integral  part  is 
represented  exactly.  Advantage  gained  by  BL  (5-model  over 
BL  g-model  is  about  3  bits. 


Fig.  (2)  shows  E'max  versus  total  number  of  bits  where 
each  coefficient  is  represented  in  FXP  and  its  total  (in- 
tegral+fractional)  number  of  bits  is  truncated  at  different 
lengths.  Advantage  gained  by  BL  (5-model  over  BL  g-model 
is  only  about  1  bit.  This  modest  improvement  is  due  to  the 
large  A;i  and  being  used.  More  dramatic  improvements 
require  smaller  A/i  and  A^  [6].  But,  this  makes  ^-model’s 
coefficients  to  occupy  a  larger  dynamic  range.  To  circum¬ 
vent  this,  we  believe,  careful  scaling  of  filter  coefficients  is 
necessary.  We  are  currently  investigating  this  possibility. 


Fig.  (3)  shows  E'max  versus  number  of  mantissa  bits 
where  each  coefficient  is  represented  in  FLP  and  its  num¬ 
ber  of  mantissa  bits  is  truncated  at  different  lengths.  Of 
course,  in  FLP,  dynamic  range  is  usually  of  no  threat. 


V.  CONCLUSION 

In  this  work,  we  have  presented  the  relationship  between  BL 

•ilizations  of  corresponding  (5-  and  ^-models.  This,  in  turn, 
dresses  computation  of  gramians  and  BL  realizations  of 
6-models. 


In  the  FXP  case,  6-model  is  better  whenever  <  I 
and  Ay  <  1  [6].  However,  this  choice  must  be  carefully  done 
since,  in  FXP,  6-models  tend  to  occupy  a  larger  dynamic 
range.  The  authors  are  currently  investigating  the  possibil¬ 
ity  of  incorporating  scaling  of  coefficients  so  that  low  values 
of  Afi  and  A^  may  be  used  to  expose  and  exploit  the  ad¬ 
vantages  of  6-systems.  In  the  FLP  case,  such  a  limitation 
does  not  usually  arise,  and  6-models  are  better  whenever  the 
system  matrix  eigenvalues  lie  within  a  certain  region  called 
the  MG-region  [14].  This  condition  is  typically  true  for  high 
Q,  narrowband  digital  filters  operating  under  high  sampling 
rates.  These  observations  indicate  that,  in  FLP,  for  compar¬ 
ative  performance  (with  respect  to  coefficient  sensitivity), 
6-models  require  a  shorter  mantissa  length.  The  ensuing  im¬ 
plications  regarding  low  power  consumption,  low  cost  and 
weight,  and  high  speed  cannot  be  overemphasized.  The  au¬ 
thors  are  currently  completing  work  regarding  quantization 
noise  properties  of  the  6-model  developed,  where,  as  in  1- 
D  cEise,  improvements  over  the  corresponding  g'-model  are 
expected. 

We  must  mention  that  certain  difficulties  regarding  limit 

ties  are  inherent  in  6-systems  when  FXP  arithmetic  is  used 
|.  However,  this  problem  is,  for  all  practical  purposes, 
nonexistent  in  FLP  arithmetic.  Hence,  in  our  opinion,  for 
FLP  high  performance  applications,  the  6-model  developed 
provides  an  extremely  attractive  solution  that  avoids  numer¬ 
ical  ill-conditioning  typically  associated  with  high  speed  q- 


systems, 
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An  Exhaustive  Search  Algorithm  For  Checking 
Limit  Cycle  Behavior  Of  Digital  Filters 
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Abstract — In  this  paper,  an  algorithm  that  can  be  utilized  to 
determine  the  presence  or  absence  of  limit  cycles  in  fixed- 
point  implementation  of  digital  filters  is  given.  It  is  applica¬ 
ble  for  filters  in  state-space  formulation  (and  hence,  appli¬ 
cation  to  the  corresponding  direct  form  follows  as  a  special 
case),  and  is  independent  of  the  order,  type  of  quantization, 
and  whether  the  accumulator  is  single-  or  doubl^length. 
Bounds  on  the  amplitude  and  period  of  possible  limit  cycles 
are  presented.  The  robustness  of  the  algorithm  in  terms 
of  limit  cycle  performance  with  respect  to  filter  coefficient 
perturbations  is  verified.  The  algorithm  is  then  used  to  ob¬ 
tain  regions  in  the  coefficient  space  where  a  filter  of  given 
order  is  limit  cycle  free.  In  this  process,  we  have  obtained 
limit  cycle  free  regions  that  were  previously  unknown  for 
the  Two’s  complement  case. 


I.  Introduction 

In  realizing  a  digital  filter,  its  coefficients  and  intermediate 
results  of  computations  must  be  stored  in  registers  of  finite 
wordlength.  Care  must  be  taken  to  suppress  resulting  limit 
cycles  as  otherwise  performance  degradation  may  render 
Rie  design  unacceptable. 

This  has  been  a  research  topic  of  interest  in  recent  years 
(see  [1-3],  and  references  therein).  Most  existing  results 
however  focus  on  the  signed  magnitude  (SM)  rounding  and 
truncation  schemes.  Recently,  some  work  on  the  two’s  com¬ 
plement  (TC)  truncation  scheme  has  also  appeared  [5-7]. 

In  what  follows,  an  algorithm  that  may  be  used  to  check 
for  limit  cycles  of  a  given  digital  filter  implemented  in  fixed- 
point  (FXP)  arithmetic  is  proposed.  It  possesses  a  wide 
scope  of  applicability:  The  filter  may  be  of  any  order;  the 
quantization  scheme  may  be  arbitrary  (including  trunca¬ 
tion  and  rounding  schemes  corresponding  to  SM  and  TC); 
and  the  accumulator  may  be  of  single-  or  double-length. 
For  a  given  digital  filter,  bounds  on  the  amplitude  and  pe¬ 
riod  of  possible  limit  cycles  are  developed.  The  algorithm 
is  based  on  an  exhaustive  search  over  all  these  possibilities. 
Extending  the  same  procedure  to  the  entire  linear  stability 
region,  one  may  now  utilize  it  to  obtain  regions  in  filter 
coefficient  space  where  the  filter  is  globally  asymptotically 
stable  (g.a.s.).  The  robustness  of  the  algorithm  in  terms  of 
the  absence  of  limit  cycles  with  respect  to  filter  coefficient 
perturbations  may  also  be  verified.  The  algorithm  in  [3], 
although  developed  with  the  same  objectives  in  mind,  is 
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only  applicable  to  filters  implemented  in  direct  form.  In 
contrast,  the  proposed  algorithm  is  applicable  for  the  more 
general  state-space  (s.s.)  formulation.  Of  course,  the  direct 
form  then  follows  as  a  special  case. 

II.  Amplitude  and  period  bounds  on  limit  cycles. 
In  general,  the  quantization  nonlinearity,  Q[*],  satisfies 

\x  -  Q[x]\  <  ^  •  g,  Vx  G  (1) 


where  q  is  the  normalized  quantization  error.  In  particular, 
for  roundoff,  g  =  0.5;  for  truncation,  g  =  1.  Note  that,  all 
filter  parameters  may  be  expressed  as  integer  multiples  of 
the  quantization  step  size  q.  Hence,  for  convenience,  we 
normalize  q  to  unity.  The  quantization  nonlinearity  thus 
becomes  an  integer  valued  function,  viz.,  Q  Z,  3?,  2^, 

set  of  reals  ana  integers.  Typically,  for  all  quantization 
schemes  of  interest,  Q[0]  =  0.  Consider  a  digital  filter  of 
order  m  in  its  minimal  s.s.  representation  {A,  B,  C,  D]: 


x(A;  +  1)  =  A  •  x(/:)  +  B  •  u(/:);  (2) 

y(A:)  =  C.x(fc)  +  D.u(A:),  (3) 

where  x  G  3R  is  the  state,  u  is  the  input,  and  y  is  the  output. 
Also^  A  e  For  addressing  limit  cycle  performance, 

consider  the  zero  input  recursive  state  equation 

x(/:  +  l)  =  A.x(A:).  (4) 


We  only  consider  linearly  stable  filters,  that  is,  all  eigen¬ 
values  of  A  are  inside  the  unit  circle  in  C  (set  of  complex 
numbers). 

Now,  under  finite  wordlength  conditions,  the  pertinent 
quantization  nonlinearity  (4)  may  be  modeled  as 

x(k  +  l)  =  Q[A‘x{k)].  (5) 

Depending  on  whether  the  result  of  a  product  is  stored 
witn  full  precision  or  whether  quantization  is  performed 
immediately  after  each  product  is  computed,  determines 
the  effect  of  this  nonlinearity.  Considering  (4),  we  get  the 

following:  .  .  i  •  -r 

If  the  products  can  be  stored  with  ffill  precision,  that  is,  if 
a  double-length  accumulator  is  available, 


x(A:-f  1) 


(6) 


Where  Xj{k)  is  the  jth  component  of  x(fc).  If  the  product 
is  quantized  immediately  after  each  product  is  performed, 
that  is,  if  only  a  single-length  accumulator  is  available, 

/  Q[aii  •  ri(fc)]  +  Q[ai2  ■  2^2(^)]  +  -  •  ‘  +  S[aim  ■  ^m(fc)]  \ 

x(fcH-i)=  I  :  j 


1 


nef  expressed  in  a  unified  man-  where  6(k)  is  the  Dirac  delta  function.  Therefore 


x(h  +  l)  =  A-x(k)+e{k),  with  |e,(fc)l<;v.e,  (8) 

where  e{k)  =  {e,(A;)}  £  SJ*"  is  the  quantization  error  vec¬ 
tor.  If  (6)  is  applicable,  AT  =  1;  if  (7)  is  applicable,  N  =  m. 

We  note  that,  (8)  is  a  description  of  a  linear  system 
driven  by  the  bounded  input  signal  e(k).  Hence,  we  have 
in  fact  converted  the  nonlinear  systems  in  (6)  and  (7)  into 
the  linear  system  in  (8). 

Now,  the  transfer  function  between  e{k)  and  x(k)  is 


^|A.>(fc)|  <  |/Col+|r(;)|(i  -  |p(‘>|)-^+. 

k^O 


This,  when  expanded,  gives 

^  oo  m  ^ 

|7fjj  |+.  .  .  +  (1  -  IPi”*)!)-!  .  ^  |r^-)|. 

>-l  *=0  >=1 

for  i  =  1, 2, . . . ,  m.  Hence 


X(^)  , 

1  e  3S(2),„Xm,  (9) 

where  X  and  E  are  the  ^-transforms  of  x  and  e,  respec¬ 
tively  and  3?(.2;)mxm  the  set  of  matrices  of  size  m  x  w.  over 
^e  rational  polynomials  in  z  G  C  .  I  is  the  identity  matrix, 
this,  when  expanded,  becomes 


Hf) 

E(z) 


Hni^)  Hx2{z) 


Hmy(z)  H^2{z)  ...  (^) 

where  Hij(z)  £  3J(z).  Hence, 


Xi(z)  =  ^  Hij(z)  ■  Ej{z),  i  =  1, 2, . . . ,  m,  (10) 

i=i 

where  X(z)  =  {X.}  and  E(z)  =  {Ej}. 

Therefore  Xi{k)  =  Er=i  W  *  e.W,  i  =  l,2,...,m, 
where  hij{k)  is  the  impulse  response  of  Hij(z).  Hence 


\Kij\  +  . . .  +  (1  -  |p1'")|)-i  .  ^  |r(-)|  j  .  (14) 

for  i  =  1, 2, . . . ,  m.  Convergence  of  this  follows  from  linear 
stability. 

Kcuiurk.  The  method  in  [3]  tends  to  be  easier  to  im¬ 
plement  and  more  general  with  regards  to  its  capability 
of  handling  poles  of  higher  multiplicity.  However,  in  our 
experience,  the  technique  described  above  often  leads  to 
lower  upper  bounds.  Note  that,  the  technique  in  [3]  utilizes 
an  interpretation  that  involves  a  cascade  of  first-order  sec¬ 
tions  whereas  the  technique  above  utilizes  a  parallel  com¬ 
bination.  Of  course,  no  one  technique  will  provide  a  lower 
bound  for  all  situations.  If  computer  cost  is  of  concern, 
one  can  run  both  techniques  and  utilize  the  lower  value  of 
the  bound.  d 

contains  a  pole  with  multiplicity  7;  Let  this  pole  oil 
multiplicity  7  be  P.  Then, 


Xi{k)  ^  *ey(A:  - 

j~l  r-0 


r),  i  =  l,2,...,m.  (11)  klij(z)  =  Kij + 


(I-P2-1)  (i-Pz-i); 


+  ...+ 


(1  -  P2-l)-r’ 


Noting  |ej(I;)|  <  N  ■  g,  from  (11),  we  get 


m  oo 


This  analysis  differs  for  the  general  term  —  Pz~^)^ 

where  ^  =  2,3,...,7.  At  this  point,  due  mainly  to  its 
ease  of  implementation,  we  utilize  the  technique  in  [3], 

(12)  considering  the  general  term  and  taking  the  inverse  z- 
transform, 


Therefore 


Mi  =tV.e.^^|/i,_,.(/:)|,  ,•=  l,2,...,m.  (13) 

j-1  fc=0 


Mi  is  the  upper  bound  for  the  absolute  value  of  a;i(l;).  To 
estimate  a  useful  upper  bound  for  Zi,  we  need  to  compute 
Er=i  ETLo  for  a  given  filter.  We  address  this  now. 

Consider  the  transfer  function  Hij{z). 

All  poles  of  Hij(^z)  are  distinct:  Hij(^z)  may  be  expressed 


—  Kij  -f- 


1  -  Pp^2-l 


+  ...+ 


'  m 


where  r\f ,  £  C  and  X.;  G  31!,  r\p  is  the  kth  residue 

of Md  Pi  the  poles.  Taking  inverse  z-transform,  we 
get 


hij{k)  =  Kij  .  S{k)  +  V  +  , . .  +  r^;’)[P,^" 


>)]* 


- _ (Of  1  r 

(1-P2-1)(1-PZ-1)...(1  _p^-l)  '[l-IPI 

This  expression  is  now  substituted  for  the  pole  of  multi¬ 
plicity  7. 

Lemma  1:  The  zero  input  response  of  the  state  x{k)  of  the 
digital  filter  described  by  eqn  (6)  or  (7)  is  periodic.  Its 
period  T  satisfies 

T<Y[{2-M.  +  l)  =  T,na.,  (15) 


where  M,-  is  the  largest  integer  not  more  than  M.-  given  bv 
eqn  (13) 

Proof:  Consider  eqn  (6)  or  (7).  The  steady-state  solu- 
tion  of  each  state  Xi(k)  will  satisfy  |a:i(A:)|  <  M,-,  V)b,  z  = 
1,  2  . . . ,  m.  Under  FXP  arithmetic,  x(/:)  E  and  hence, 
|^t(^)|  <  Mi.  Xi(k)  can  therefore  take  only  a  finite  number 
of  values,  namely,  (2  •  Mi  -f  1).  Hence,  x(/:)  can  take  only 
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f  ' 


a  finite  number  of  values,  namely,  n.-^i(2  •  Mi  +  1).  Note 
that,  the  current  state  vector  x(^)  uniquely  determines  the 
next  state  vector  x(A:  +  l)  through  the  function  Q[.].  Thus, 
^mk)  must  be  periodic  in  k.  Its  period  is  bounded  by 

Tmax  =  JJ(2  •  Mi  +  1).  (IgJ 

«=1 

III.  Algorithm  Description 

We  now  formulate  the  theoretical  basis  for  the  algorithm 
and  discuss  some  of  its  computational  aspects. 

Definition  1:  The  digital  filter  realization  in  (8)  is  said  to 
be  globally  asymptotically  stable  (g.a.s.)  if  and  only  if, 
for  any  initial  state  x(0)  G  2  with  ||x(0)|loo  <  B,  where 
5  G  there  exists  L  ^  Z+  such  that  y:{k)  =  0,  where  0 
is  the  null  matrix,  for  k  >  L. 

Remark.  Typically,  g.a.s.  is  taken  to  hold  when  x(k)  ->■ 

0  as  k  oo  (under  the  conditions  above).  However,  due 
to  the  finite  wordlength  available,  the  filter  behaves’ as  a 
finite  state  machine,  and  Definition  1  suffices. 

Lemma  2:  Consider  j;  >  0  and  any  initial  state  vector  x(0) 
such  that  Is, (0)1  <  Bi  with  H,-  >  M,-,  for  i  =  1,2,  ...,m. 
Then,  there  exists  a  sufficiently  large  positive  number  £, 
such  that  the  digital  filter  in  (6)  or  (7)  satisfies  |a;.(I;)|  < 

+  »7,  VIr  >  £,  for  i  =  1, 2, . . . ,  m. 

Proof:  Since  ^  is  assumed  to  be  linearly  stable,  the  digital 
filter  in  (8)  is  in  fact  g.a.s.  Hence,  (8)  will  yield  a  set  of 
nonhomogeneous  linear  shift-invariant  difference  equations 
^flkch  will  have  its  solution  in  two  parts:  A  steady-state 
Wbtion  s(I:)  and  a  transient  solution  t{k).  Clearly,  with 
g.a.s.,  given  r?  >  0,  we  can  choose  k  sufficiently  large,  say, 
k  >  C,  such  that  maxjt.(l:)l  <  for  i  =  1, 2, . . . ,  m.  Since 
Mi  e  Z+,ioT  k>  c,  Mi+T)  will  act  as  a  true  upper  bound 
for  Xi{k)  in  (8).  □ 

Hence,  it  suffices  to  check  the  state  vectors  in 

=  {x(/:)  6  2  ||a:i(^)|  <  Wi,  .' =  1,  2, . . . ,  m}  ,  (17) 

to  see  if  they  are  mapped  to  0  by  (8)  after  a  finite  number  , 
of  iterations.  , 

Computational  Aspects  :  The  computations  within  the  al-  ' 
gorithm  are  carried  out  in  two  stages.  Initially,  all  vectors  • 
x{k)  €  which  map  to  0  in  less  than  recursions—  \ 
(after  all,  if  limit  cycles  exist,  the  maximum  period  is  Tmax)—  * 
are  eliminated  from  The  remaining  vectors  in  are  * 
then  further  checked  for  convergence  (see  Section  B). 

Section  A.  Consider  the  set  where 


Furthermore,  any  vector  in  5(i)  which  is  mapped  to 

by  (6)  or  (7)  in  one  iteration  will  also  converge  to  0  Hence 
form  ’ 

v(2)  =  {x(fc)  6  .  x(*)]  e  v(*)}  .  (20) 

Hence,  consists  of  all  the  vectors  x{k)  €  that  map 
to  0  in  exactly  two  iterations  of  (6)  or  (7).  Hence,  form 


5(2)=5W\VW.  (21) 

Note  that,  =  ;C[5(0)]  _  )C[V(D]  _ 

Likewise,  we  get  the  following  sets:  For  L  =  1,2,...,  Tmax 

V<^)  =  {x(fc)€5(^-i)|Q[A.x(fc)]ev(^-‘)},  (22) 

and 

S(L)  ^  5(L-i)  ^  ^(i) 

Note  that,  1C[VW]. 

condffions  under  which  this  construction  is  termi- 
nated  and  their  implications  are  as  follows* 

(1)  If 

/C[5(^)]  =  0,  for  some  L  =  1,2,.. Tmax  -  1,  (24) 

all  vectors  in  are  convergent 

(2)  If 

=  0,  for  some  L  =  l,2,...  Tmax,  (25) 

then 

5(0=5(i-i)_  i  =  L,L  +  l,...,Tmax.  (26) 

Under  this  situation,  the  remaining  vectors  in  there 

are  of  them-will  be  further  checked  for  conver- 

gence  (see  Section  B). 

Remark.  Upon  a  little  reflection,  one  notices  that 

must  either  be  empty  or  contain  one  and  only  one  vector 
from 

Section  B.  Although  the  reverse  mapping  procedure  out¬ 
line  above  reduces  the  computational  complexity  consid¬ 
erably,  it  may  not  capture  all  the  vectors  in  L  = 

I,  ,  Tmaxj  that  map  to  0  within  Tmax  iterations.  This 
is  due  to  the  fact  that,  there  may  be  vectors  in  y(^)  that 
map  to  0  through  a  vector  not  belonging  to  Hence 
when  encountered  with  condition  (2)  above,  convergence 

of  each  remaining  vector  in  is  determined  by  check¬ 

ing  whether  it  is  mapped  to  0  in  less  than  Tmax  through 
either  (6)  or  (7),  whichever  is  applicable.  This  exhaustive 
technique  is  in  fact  an  extension  of  that  given  in  [3]  to  dig¬ 
ital  filters  represented  in  their  s.s.  realization.  However 
we  must  emphasize  the  significant  computational  advan¬ 
tage  gained  by  first  invoking  the  mapping  procedure  in 
Section  A.  Assuming  condition  (2)  has  occurred,  let 

=  (27) 


^  —  {^(^)  G  -  x(fc)]  =  o|  ,  (18) 

Hence,  consists  of  all  the  vectors  x(k)  £  S(°)  that 
map  to  0  in  one  and  only  one  iteration  of  (6)  or  (7).  Any 

other  convergent  vector  in  5^°)  must  map  to  prior  to 
reaching  0.  Hence,  form  f  *  -u 

®  5(»=5«»\vU).  (,9) 

Note  that,  A:[S(^)]  =  /C[S(^)]  _  In  fact,  = 

^max‘  /C[-]  defines  the  cardinality  of  a  set. 


Note  that,  when  condition  (2)  has  occurred,  from  (26), 

vector  x[^^  £  construct  the 
orbit  consisting  of  all  state  vectors  x[^^(j),  for  j  = 
1, 2, . . . ,  Tmax,  that  are  consecutively  generated  by  (6)  or 
(7)  (whichever  is  applicable)  with  x^^^  as  the  initial  state, 
that  is,  xp>  =  x(^)(0).  For  each  t  =  1, 2, . . . ,  the 

conditions  under  which  the  construction  of  each  orbit 
is  terminated  and  their  implications  are  as  follows:  ‘ 

(1)  If 

Xi^^0)=0,  forsome  J  =  1. 2, . . . ,  (28) 
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then,  together  with  each  vector  in  the  orbit  will 
be  convergent. 

(2)  If 

for  j  ^  k,  (29) 


then  gives  rise  to  limit  cycles. 

Remark.  These  are  in  fact  the  only  conditions  that  can 
occur  when  either  (6)  or  (7)  generate  the  orbit. 

Note  :  An  analysis  was  carried  out  to  determine  the  ro¬ 
bustness  regions  under  single-  and  double-  length  environ¬ 
ments  the  results  of  which  are  found  in  [8]. 


IV.  Some  Examples 

The  proposed  algorithm  is  applied  to  a  dense  grid  in  the 
coefficient  space  to  obtain  the  total  g.a.s.  region.  With 
the  robustness  region  due  to  the  variation  of  the  coef¬ 
ficients,  each  point  in  the  coefficient  space  is  associated 
with  a  neighborhood  where  the  filter  is  stable.  A  10-bit 
wordlength  is  assumed  for  all  computations.  Therefore, 
the  filter  coefficients  are  quantized  to  a  multiple  of 
Within  the  linear  stability  region,  dark  areas  indicate  filters 
that  possess  limit  cycles. 

The  results  provided  correspond  to  the  most  commonly 
encountered  quantization  schemes,  namely,  SM  roundoff, 
SM  truncation,  and  TC  truncation  schemes.  In  all  cases, 
both  single-  and  double-length  accumulator  implementa¬ 
tion  results  were  analyzed.  All  results  are  provided  for  a 
second-order  filter.  All  results  given  in  [3]  for  direct  form 
filters  were  also  verified  using  this  algorithm. 


Results  for  minimum  norm  realization  of  digital  filters 
Stability  of  digital  filters  in  its  minimum  norm  realization 
was  also  investigated.  ^The  coefficient  matrix,  in  such  a 

'  A  cr  uj  " 
case,  IS  A  — 

[  —cj  a 

The  results  for  the  SM  roundoff  for  both  single-  and 
double- length  accumulator  environments  were  verified  [9]. 
The  stable  region  for  SM  truncation  scheme  for  both  single- 
and  double-length  accumulator  environments  span  the  en¬ 
tire  linear  stability  region  <  1. 

For  TC  truncation,  with  double-length  accumulator,  the 
g.a.s.  region  is  in  Figure  (la).  This  in  fact  improves  on  the 
previously  known  results  in  [7].  For  instance,  the  series  of 
points  that  satisfy  c  <C  0  and  uj  —  dzcr.,  are  also  limit 
cycle  free.  The  following  coefficient  matrix  belongs  to  this 
class: 

672  672 

“1024  1024 


A  =:z 


672 
'  1024 


672 
’  1024 


(30) 


To  the  authors’  knowledge,  no  previous  results  are  avail¬ 
able  for  TC  truncation  in  a  single-length  accumulator  en¬ 
vironment.  The  region  of  g.a.s.  is  in  Figure  (lb). 

VII.  Conclusion 


A  new  algorithm  capable  of  determining  g.a.s.  of  any 
FXP  digital  filter  in  its  s.s.  formulation  has  been  pre¬ 
sented.  The  algorithm  is  applicable  independent  of  the 
nonlinearity,  number  of  nonlinearities,  and  order  of  filter. 
In  most  cases,  the  proposed  algorithm  is  found  to  provide 
tighter  bounds  on  the  amplitude  of  limit  cycles.  Signifi¬ 


cant  improvements  over  existing  results  for  the  TC  trun¬ 
cation  schemes  in  both  single-  and  double-length  accumu¬ 
lator  environments  have  been  presented.  Current  researc^^ 


- — . v^urrent  resear( 

is  directed  towards  establishing  regions  within  which  lin 
cycles  of  a  pre-specified  period  or  bound  exists. 


Figure  1:  The  region  where  a  filter  with  twos 
complement  truncation  is  free  of  limit  cycles, 
(a)  Double  (b)  Single  , length  accumulator 
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Abstract: 


Zero  input  asymptotic  response  behavior  of  general  order  2-D  digital  filters  with  floating 
point  arithmetic  is  investigated.  In  particular,  conditions  for  the  absence  of  so-called  R1 
and  R2  responses  (large  amplitude  limit  cycles)  are  provided  for  2-D  first  quarter-plane 


causal  filters. 
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L  Introduction 


Recently,  floating  point  arithmetic  has  become  popular  for  a  number  of  digital  signal 
processing  applications.  The  implementation  of  digital  filters  in  floating  point  format  is 
especially  attractive  due  to  the  high  dynamical  range  and  the  high-level  programming  tools 
available. 

Previous  work  on  the  convergence  behavior  of  floating  point  digital  filters  concentrated 
on  1-D  second  order  system  [1].  Some  results  on  direct  form  filters  are  also  available  [2]. 
However,  to  the  authors’  knowledge,  the  case  of  general  order  1-D  or  2-D  state  space  models 
has  not  been  tackled. 

This  paper  provides  such  an  analysis  which  can  be  applied  to  any  digital  filter  structure 
of  arbitrary  order  and  dimension  one  or  two.  In  order  to  avoid  distinguishing  among  a  num¬ 
ber  of  reformatting  and  quantization  schemes,  the  result  introduced  in  this  paper  takes  a 
parameterization  approach  to  the  error  description.  This  allows  to  apply  the  derived  result 
to  any  type  of  floating  point  format, 

II.  Preliminaries 

Consider  the  Roesser  model  for  the  first  quarter-plane  causal  2-D  system: 

f  i^(ni  + 1,722)  \  Ahv  \  (  i^(ni,n2) 

V  +  1)  j  V  Ayy  j  \  F(ni,722)  ) 


A  = 


Ahh  \  ^  ^  ^{Ni+N2)x{Ni+N2) 

Ayh  Ayy  J 


Ahh  e 

The  submatrices  Ayh  and  Akv  are  of  the  appropriate  dimensions.  The  vectors  and 
are  horizontally  and  vertically  propagating  state  vectors  of  the  ideal  system,  respectively. 

For  floating  point  realizations  of  (1),  the  following  error  model  describes  the  system 
behavior: 


x^{ni  +  1, 722) 

^"(721,712  +  1) 


Ahh  Ahv 

Ayh  Ayy 


X^(72i,  722)  \  ,  (  e^(ni,«2) 

7;  /  \  I  77/  \ 


5c"(72i,  722) 


e''(72i,  722) 


where  (721, 722)  E  are  the  error  vectors  for  the  horizontally  and 

vertically  propagating  states,  respectively. 
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(4) 


We  also  need  to  define  the  following  transfer  matrices: 

V  i’(^i,*2)  j  ~  1.  H"{zuz,)  j  £“(*., Zj)  ) 

where 

(  H>^\z^,Z2)  H'^^{z^,Z2)\ 

V  H'^\z,,Z2)  H'^''izi,Z2)  )  [ 

In  Equation  (4),  ^^(^1,22)  and  2Ci^i,^2)  are  the  ^-transforms  of  the  states  x^(ni,n2)  and 
x’^(ni,n2),  respectively.  The  transforms  H'^’^{zi,Z2),H^^{zi,Z2),II'''^(zi,Z2)  and  H'“'^{zi,Z2) 
are  transfer  submatrices  of  dimensions  Ni  x  Nj^Ni  x  N2,N2  x  Ni,N2  x  N2,  respectively. 
K^{zi,Z2)  and  ^"(21,^2)  are  the  2-D  ^-transforms  of  the  error  signal  vectors  e^(ni,n2) 
and  e"(ni,n2),  respectively.  Furthermore,  in  (5),  Ii  and  I2  denote  identity  matrices  of 
dimensions  Ni  x  Ni  and  N2  x  N2,  respectively. 

The  components  H^^{zi^Z2)iH’^'^{zi,Z2)tH'"^{zi,Z2)  and  H'^'^ {zi,  Z2)  are  2-D  transforms 
denoted  by  H^J^{zi,Z2),H^f{zi,Z2),HyJ^{zi,Z2)  and  22),  respectively. 

Denoting  Z{-}  as  the  2-D  z-transform,  we  define  the  following  impulse  responses: 


H^-{z„Z2)  -- 

=  S{/if/(ni,n2)};  i  =  l,- 

i  =  i,- 

(6) 

H^-{Zi,Z2)  -- 

=  Z{h'iy(ni,n2)}-,  i  =  !,■■ 

J  =  Ir  • 

■■,N2. 

(7) 

H:,^{zuZ2)  -- 

=  2:{Aj(ni,n2)};  i  =  1,- ■ 

■■,N2-, 

i  =  l," 

(8) 

if^(^l,^2)  = 

=  Z{h'-j{ni,n2)}-,  i  =  1,- ■ 

■,N2-, 

i  = 

•,iV2- 

(9) 

Zi/i  (j) 

(j)  Z2I2 


—  1 


-A 


(5) 


Next,  we  define  the  /i-measures  for  each  component  of  the  transfer  function  submatrices: 
x/  00  00  cv  \y 

r 


if; 


V  7^ 

u 


ni=0  n2=0 


00  00 


ni=0  712=0 


00  00 


=  E  E  l*?*("i."2)l;  .  =  l,---,Af2;  y  =  l, 

m  =0  7^2~o 
00  00 

=  E  E  K-C"!. ’==)!;  iJ  =  u---,N2. 

711=0  712=0 


,iVx. 


(10) 

(11) 

(12) 

(13) 


Also: 


Ni  N2 


=  E  +  E  6t’ 

(14) 

I/=l 

u—l 

Ni 

N2 

H] 

=  E  hj!:  +  y:hj: 

(15) 

i/=l  i/=l 
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From  [1,2]  it  is  known  that  the  following  four  state  response  types  are  encountered  in  float¬ 
ing  point  digital  filters  under  zero  input,  if  the  linear  filter  is  stable: 

Rl:  an  unbounded  state  response,  eventually  leading  to  overflow  conditions. 

R2:  a  bounded  state  response 

R3:  a  bounded  state  response  in  underflow 

R4:  a  zero- convergent  response 

III.  The  Main  Result 

The  following  theorem  can  now  be  formulated; 


Theorem:  A  floating  point  implementation  of  the  system  in  (1)  for  any  finitely  extended 
input  signal  and/or  non-zero  finitely  extended  initial  conditions  will  produce  a  response 
type  R3,  if  the  mantissa  length  Im  satisfies 

^  2  +  log2  ^  ^ 

where  H  =  HJ)  and  C  is  an  implementation  dependent  constant. 

Proof:  The  proof  is  rather  lengthy  and  will  be  supplied  in  the  final  version  of  the  paper. 

Formally,  this  result  is  similar  to  previous  results  on  direct  forms  [1]  and  second  order 
state-space  systems  [2].  In  this  case,  the  stability  margin  enters  the  inequality  through 
which  is  a  somewhat  complicated  measure  of  the  degree  of  stability  of  the  system.  For  an 
unstable  system  H  oo,  and  for  any  stable  system  we  have  H  <  oo.  The  constant  C 
relates  the  magnitude  of  the  state-variables  to  the  error  bound.  This  number  is  usually 
small  and  is  directly  affected  by  the  entries  of  the  A-matrix  and  the  floating  point  format. 

IV.  Conclusion 

This  paper  presents  a  condition  on  the  mantissa  length  of  a  2-D  floating  point  digital 
filter  of  arbitrary  order,  which  ensures  convergence  of  the  state-response  into  underflow, 
independent  of  the  initial  conditions.  The  mantissa  length  is  linked  to  the  margin  of  sta¬ 
bility  of  the  linear  system  as  measured  by  H.  It  is  also  dependent  on  the  realization  itself. 
It  should  be  noted  that  the  response  types  R2  and  R3  in  the  2-D  (and  m-D)  case  do  not 
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need  to  be  periodic  [3]. 
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Introduction 

Traditional  control  and  signal  processing  algorithms  based  on  shift-operator  (or,  q- 
operator)  are  ill-conditioned  in  high  performance  applications  that  involve  fast  sam¬ 
pling/shorter  wordlength  [1].  In  these  situations,  ^-operator  based  discrete-time  imple¬ 
mentations  (or,  ry-systems)  are  extremely  sensitive  to  uncertainties  inherent  in  modelling 
and  pai'ameter  representation  (in  particular,  with  shorter  wordlength). 

Use  of  incremental  difference  operator  or  delta-operator  (or,  ^-operator)  can  provide 
an  effective  solution  to  such  difficulties  [1].  Compared  to  ^-systems,  ^-operator  based  im¬ 
plementations  (or,  ^-systems)  can  provide  superior  performance  with  respect  to  (a)  coeffi¬ 
cient  sensitivity  of  frecjuency  response  [1],  and  (b)  quantization  noise  propagation  [2].  Due 
mainly  to  these,  and  also  due  to  the  possibility  of  a  unified  treatment  of  both  continuous- 
and  discrete-time  systems,  work  on  ^-systems  has  recently  attracted  considerable  attention 
(see  [1-5],  and  references  therein). 

Problem  Statement 

Since  ^'-operator  can  offer  several  important  advantages  over  ^-operator  for  linear,  time- 
invariant  one-dimensional  (1-D)  systems,  would  similar  advantages  hold  true  for  more 
general  classes  of  systems?  Work  on  linear,  multi-dimensional  (ju-D)  systems  indicate 
that  this  may  indeed  be  the  case  [5].  In  this  paper,  we  investigate  the  applicability  of 
^-operator  based  numerical  schemes  for  simulation  of  nonlinear  systems. 

Delta- Operator  Ba.sed  Numerical  Scheme 

q-Operator  Based  Numerical  Scheme.  We  consider  the  computation  of  solution  orbit  of  a 
nonlinear  system  of  the  type 

5[x](n)  =  fg(x(?r),a,),  (1) 

where  ^[x](/7)  =  x(n  +  1).  Here,  x(77)  is  the  state  orbit  x  G  at  instant  n  and 

Kainal  Premaratne  and  Peter  H.  Bauer  gratefully  acknowledge  the  support  provided  by  the  Office  of 
Naval  Research  (ONR)  through  the  grants  N00014-94-1-0454  and  N00014-94-1-0387,  respectively. 
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1  •  •  •  > G  refer  to  system  parameters  that  are  actually  stored  within 
the  computer  while  performing  the  iteration. 

S-O'perator  Based  Numerical  Scheme.  The  proposed  ^-operator  based  scheme  of  the  same 
nonlinear  system  in  (1)  is 

(5[x](n)  =  f5(x(n),a5)  (Intermediate  equation) 

g[x](7i.)  =  x(7r)  +  A  ■  5[x](77)  (Update  equation),  ^ 

where  (!)'[x](7r)  =  (g[x](/7) -x(7^))/A  and  f5(x(7r),ai)  =  (f5(x(7r),  a^)  -  x(7r))/A.  Here,  A  is 
an  arbitrary  positive  real  parameter  (usually  the  grid  size)  and  a^  =  , . . . ,  as^V  G  3?^ 

again  refer  to  system  parameters  that  are  actually  stored  within  the  computer. 

Now,  which  of  the  schemes  (1)  or  (2)  yield  superior  coefficient  sensitivity  of  its  orbit 
with  respect  to  perturbation  of  a^  or  a^,  respectively?  This  consideration  is  crucial  in  high 
performance,  real-time  applications  that  may  require  fast  sampling/shorter  wordlength.  Of 
course,  with  infinite  wordlength,  both  (1)  and  (2)  yield  identical  results.  In  our  develop¬ 
ment,  the  nonlinearity  is  taken  to  belong  to  that  is,  it  possesses  first  partial  derivatives. 
Small  perturbations  are  assumed. 

Contributions 

The  contributions  of  this  pcxper  are  the  following: 

1.  Development  of  coefficient  sensitivity  measures  Mpxp  and  Mppp  for  fixed-point  (FXP) 
and  floating-point  (FLP),  repectively.  These  take  into  account  that  in  FXP,  coefficient 
perturbation  is  approximately  independent  of  its  nominal  value,  while  in  FLP,  it  is 
app  r  oxi  mat  ely  pr  op  or  t  ion  al . 

2.  FXP;  A/pxp  for  6-system  is  A  times  A/pxp  for  ^-system.  Hence,  6-system  is  superior 
under  small  grid  size. 

3.  FLP;  ilfr  'LP  for  6-system  is  superior  than  Mppp  for  ^-system  if  —  1|  <  |,  Vf  = 

1,...,A/.  Here,  indicates  the  ‘linear’  term  in  the  f-th  equation  of  f,.  We  show 
that,  tyqiical  digital  equivalents  of  continuous-time  nonlinear  systems  obtained  under 
fast  sampling  routinely  satsify  this  condition. 

4.  Similar  comments  hold  true  for  linear  systems,  piecewise  nonlinear  systems,  and 
piecewise  linear  systems. 
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Summary  of  Phase  Pi  Results 

Phase  PI  consists  of  two  tasks: 

[Tl]  Task  Tl:  Analysis  and  design  of  finite  wordlength  implementations  of  linear,  time- 
invariant  (^-Systems. 

[T2]  Task  T3:  2-D  and  m-D  (5-system  models. 

Major  part  of  task  Tl  was  carried  out  at  the  University  of  Notre  Dame  by  Dr.  Peter  H. 
Bauer  while  major  part  of  task  T3  was  carried  out  at  the  University  of  Miami  by  Dr.  Kamal 
Premaratne.  The  project  being  an  extensive  collaborative  effort,  during  this  research  work, 
the  two  PFs  have  been  in  constant  contact. 


The  following  is  a  smnmary  of  the  phase  PI  results. 


Task  Tl;  Analysis  and  Design  of  Finite  Wordlength  Implementations  of  Linear, 
Time-Invariant  ^-Systems 

The  conclusions  drawn  from  the  work  conducted  for  task  Tl  may  be  summarized  as  follows: 

1.  The  Fixed- Point  Arithmetic  Case:  When  limit  cycle  performance  is  crucial,  the  q- 
operator  implementation  is  preferrable.  The  (5-operator  implementation  is  superior 
with  regard  to  coefficient  sensitivity  issues. 

2.  The  Floating-Point  Arithmetic  Case:  Generally,  the  ^-operator  implementation  out¬ 
performs  its  ^'Operator  counterpart.  In  particular,  in  high-order  and  high-speed  ap¬ 
plications,  the  6-operator  implementation  is  the  best  choice. 


Prior  to  a  more  detailed  exposition,  first  we  provide  qualitative  justification  for  the 
above  conclusion.  The  state  equations  of  a  6-operator  system  can  be  written  as: 

6[x](n)  =  Aix(n)  +  B6\i{n)\ 

(7[x](n)  =  x(n) -f- A  •  6[x](n). 

where  x  and  u  are  the  state  and  input  vectors,  respectively.  Here,  A  denote  a  positive  real 
constant  (typically,  the  sampling  time).  The  symbol  6[-]  denotes  the  6-operator,  that  is, 


6[x](n) 


g[x](n)  -  x(n) 
A 


9  - 
A 


(T1.2) 
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and  5[-]  denotes  the  usual  g-operator,  that  is, 

5[x](n)  =  x(n  +  1).  (T1.3) 

The  corresponding  formulation  of  (Tl.l)  in  terms  of  the  ^-operator  is 

5[x](n)  =  Aqyi{n)  +  i?,u(n),  (T1.4) 

where 

Aq  =  I  +  A-As^A6  =  ^^^  and  Bq  =  A  ■  Bs  ^  Bs  =  ^.  (T1.5) 

Now,  given  x  and  u,  both  representations  compute  ^[x]  with  a  certain  accuracy. 
Consider  the  ^-operator  formulation  in  (Tl.l).  Here  we  encounter  two  errors: 

1.  The  first  is  due  to  the  computation  of  ^[x],  that  is,  the  first  equation  in  (Tl.l).  We 
will  refer  to  this  equation  as  the  intermediate  equation. 

2.  The  second  is  due  to  the  eventual  computation  of  ^[x],  that  is,  the  second  equation 
in  (Tl.l).  We  will  refer  to  this  equation  as  the  update  equation. 

Let  us  assume  that  the  total  error  in  computing  g[x]  is  mainly  due  to  the  intermediate 
equation  in  (Tl.l)  (rather  than  the  update  equation).  Then,  by  choosing  A  sufficiently 
small,  the  total  error  in  computing  g[x]  will  be  approximately  the  error  created  by  the 
update  equation  which  is  small!.  In  this  case,  the  6-operator  representation  has  better 
finite  wordlength  properties  than  its  ^-operator  counterpart  in  (T1.4). 

If,  however,  the  errors  accumulated  in  the  intermediate  and  the  update  equations  in 
(Tl.l)  are  comparable,  ^[x]  computed  through  the  6-operator  representation  will  show 
approximately  the  same  error  as  that  computed  through  its  g-operator  counterpart  as¬ 
suming  A  is  sufficiently  small.  If  A  is  not  sufficiently  smaller  than  one,  the  6-operator 
representation  will  actually  perform  worse  than  the  ^-operator  representation! 

If  the  error  introduced  in  the  update  equation  is  larger  than  that  in  the  intermediate 
equation,  the  6-operator  representation  would  consistently  perform  worse!!  In  reality,  this 
case  is  very  unlikely  to  occur. 

Next,  a  more  detailed  exposition  follows. 

Tl.l  The  Fixed-Point  Arithmetic  Case 

We  now  discuss  some  of  the  results  regarding  the  fixed-point  (FXP)  case.  Here,  our  results 
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in  fact  indicate  that,  in  case  limit  cycle  behavior  is  crucial,  the  ^-operator  representation 
is  NOT  suitable  with  this  arithmetic  scheme  [1].  Such  a  case  may  occur  when  nonlinear 
systems  are  implemented  through  FXP  6-operator  based  schemes. 

Zero-input  limit  cycles.  Independent  of  A,  zero-input  limit  cycles  cannot  be  avoided 
in  FXP  6-implementations.  This  is  easily  explained  as  follows:  If  A  is  chosen  very  small, 
the  contribution  from  the  intermediate  equation  being  small  (since  6[x]  is  being  multipHed 
by  A),  during  the  update  equation,  g[x]  can  be  quantized  to  x  creating  a  DC  limit  cycle, 
that  is,  an  incorrect  equilibrium  point  different  from  zero  results.  We  emphasize  that,  most 
of  the  desirable  properties  of  6-operator  implementations  are  based  on  a  small  A.  We  may 
also  show  that,  if  A  is  chosen  larger  (this  case  is  of  course  somewhat  less  important),  DC 
limit  cycles  will  stiU  exist.  Hence,  6-operator  representations  cannot  be  implemented  limit 
cycle  free  in  FXP  format!  This  fact  is  independent  of  the  particular  realization  of  the 
system. 

Deadband  size.  Since  6-systems  cannot  be  implemented  limit  cycle  free  in  FXP  format, 
it  is  of  interest  to  investigate  te  the  size  of  such  limit  cycles  since,  in  certain  situations, 
such  small  limit  cycle  amplitudes  can  be  tolerated.  It  can  be  shown  that,  the  magnitude  of 
A  determines  the  magnitude  of  the  limit  cycle.  The  smaller  the  A,  the  larger  will  be  the 
deadband  and  hence  the  limit  cycle  magnitude.  An  approximate  relationship  regarding 
this  is 


A  X  size  of  deadband  =  1,  (T1.6) 

where  the  size  of  deadband  is  measured  in  multiples  of  the  quantization  step  size.  Here, 
the  deadband  corresponds  to  that  obtained  by  considering  the  quantization  of  A  •  6[x]. 
Therefore,  the  usual  choice  of  a  small  A  creates  a  larger  deadband! 

The  input  driven  case.  Although  the  input  driven  case  is  not  part  of  the  originally 
proposed  work,  some  interesting  results  have  been  obtained.  For  small  values  of  A,  there 
exists  a  bounded  input  signal  that  does  not  allow  control  of  the  state  trajectory.  In  other 
words,  given  suIBciently  small  A,  the  state  trajectory  may  not  be  influenced  by  such  an 
input  signal. 

The  influence  of  the  realization.  First,  it  was  necessary  to  develop  a  suitable  scheme 
to  investigate  the  effect  of  realization  on  the  presence  or  absence  of  limit  cycles.  In  this  di¬ 
rection,  for  the  ^-operator  case,  a  computer-based  exhaustive  search  algorithm  that  checks 
for  limit  cycles  (DC  and/or  oscillatory)  has  been  developed  [5]. 
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As  discussed  before,  we  have  shown  that,  a  stable  linear  time-invariant  ^-system  cannot 
be  implemented  limit  cycle  free  in  FXP.  The  size  of  the  deadband  however  also  depends  on 
the  particular  realization,  that  is,  the  structure  of  As.  Given  a  system  transfer  function, 
there  are  forms  which  minimize  this  deadband  size  with  respect  to  some  appropriately 
chosen  measure.  For  example,  in  order  to  minimize  DC  limit  cycle  amplitude,  one  may 
choose  the  normal  form  (in  terms  of  as  a  suitable  candidate. 

The  influence  of  quantization  nonlinearity  and  its  deadzone.  Since  a  larger  deadzone 
implies  larger  DC  limit  cycle  amplitudes,  the  use  of  quantizers  with  reduced,  or  even 
zero,  deadzone  was  therefore  proposed.  In  investigating  first-order  systems,  by  reducing 
the  deadzone,  it  was  found  that,  existence  of  DC  limit  cycles  can  indeed  be  reduced. 
Unfortunately,  other  oscillatory  limit  cycles  will  be  created.  This  phenomenon  is  due  to 
the  increased  gain  exhibited  towards  small  input  signals  by  the  quantizer. 

Scaling.  As  discussed  above,  we  have  shown  that,  independent  of  either  the  form  of 
As  or  the  magnitude  of  A,  a  FXP  implemented  ^-system  cannot  be  free  of  zero-input  limit 
cycles.  Hence,  scaling  cannot  be  offered  as  a  possible  solution. 

T1.2  The  Floating-Point  Arithmetic  Case 

The  floating-point  (FLP)  implementation  of  6-systems  is  currently  under  investigation. 
The  results  obtained  so  far  are  very  encouraging,  and  indicate  that,  quantization  errors 
due  to  FLP  arithmetic  have  a  much  smaller  effect  on  the  system  behavior  than  in  the  FXP 
case.  In  fact,  preliminary  results  show  that,  for  6-systems  of  order  three  and  higher,  errors 
in  computing  ^[x]  can  be  made  significantly  smaller  than  for  the  corresponding  ^-systems. 
This  is  because,  for  a  FLP  implementation  of  such  a  system,  errors  created  through  the 
intermediate  equation  are  larger  than  those  created  through  the  update  equation.  As 
previously  mentioned,  in  this  situation,  6-systems  behave  better  than  their  ^-operator 
counterparts! 

Limit  cycles.  In  FLP  arithmetic,  a  linearly  stable  time  invariant  system,  under  zero- 
input  conditions,  may  exhibit  four  types  of  responses:  A  diverging  response,  an  oscillatory 
periodic  response  of  arbitrary  magnitude,  an  oscillatory  periodic  response  in  underflow, 
or  an  asymptotically  stable  response.  Only  the  last  two  response  types  are  acceptable  in 
practice.  It  is  well  known  that,  the  last  response  type  is  in  fact  a  very  stringent  requirement 
and  is  often  not  required  in  practice.  Results  so  far  obtained  show  that,  when  the  require¬ 
ments  for  a  response  in  underflow  are  compared,  the  6-system  requires  less  wordlength 
than  its  5-system  counterpart!  This  advantage  in  fact  grows  with  the  order  of  the  system!! 
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Once  the  system  reaches  underflow  conditions,  the  (^-system  again  exhibits  DC  limit 
cycles.  However,  if  the  exponent  register  is  chosen  sufficiently  large,  the  amplitude  of  these 
oscillations  can  be  made  extremely  small  and  hence,  for  all  practical  purposes,  this  problem 
is  solved. 

Deadband  size.  If  the  condition  on  the  mantissa  length  that  guarantees  convergence 
into  underflow  is  satisfied,  then  the  deadband  size  will  be  very  small.  Hence,  it  can  be 
neglected  for  all  practical  purposes.  This  assumes  a  properly  chosen  exponent  register 
length  since  the  exponent  register  length  determines  the  dynamic  range  of  underflow. 

The  Influence  of  the  Nonlinearity.  Unlike  the  FXP  case,  the  characteristic  of  the 
nonlinearity  has  only  a  minor  effect  on  the  system  behavior,  significant  differences  being 
present  only  in  underflow  conditions 

The  Underflow  case.  In  underflow,  the  ^-system  seems  to  behave  worse  than  its  q- 
operator  counterpart.  This  is  mainly  due  to  the  fact  that,  a  FLP  system  in  underflow 
essentially  performs  very  similar  to  a  FXP  system.  However,  as  mentioned  above,  if  the 
dynamic  range  of  underflow  is  chosen  properly,  the  system  behavior  in  underflow  is  of  little 
practical  interest. 

Block  Floating-Point  Arithmetic.  Even  for  the  ^-operator  case,  results  regarding  block 
FLP  implementations  are  lacking.  Hence,  investigations  regarding  block  FLP  implemen¬ 
tation  of  ^-systems  is  in  its  early  stages.  In  order  to  obtain  a  comparison  between  the  two 
types  of  implementations,  current  research  is  geared  towards  obtaining  results  applicable 
for  the  5-operator  case. 

T1.3  The  Multi- Dimensional  Case 

The  results  on  one-dimensional  (1-D)  ^-operator  implementations  in  FXP  arithmetic  di¬ 
rectly  carry  over  to  the  multi-dimensional  (m-D)  case.  The  existence  of  non- converging 
responses  along  the  boundary  of  the  causality  region  can  easily  be  proven  using  the  same 
type  of  argument  used  in  the  1-D  case.  Consequently,  6-operator  based  implementations 
of  m-D  systems  cannot  be  implemented  limit  cycle  free  in  FXP. 

Task  T3;  2-D  and  m-D  6-system  models 

Discrete-time  systems  implemented  using  the  6-operator,  as  is  clear  from  the  discussion 
above,  exhibit  superior  finite  wordlength  properties  with  FLP  arithmetic.  In  the  case  of 
FXP  arithmetic,  they  still  provide  superior  coefficient  sensitivity.  The  development  of  2-D 
and  m-D  models  applicable  for  6-operator  implementations  was  hence  motivated  with  the 
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expectation  that  these  properties  would  still  hold  true. 


The  conclusions  drawn  from  the  work  conducted  for  task  T3  may  be  summarized  as 
follows:  Similar  to  the  TD  case,  under  FLP  arithmetic,  the  (5-operator  implementation  of 
2-D  and  m-D  discrete-time  systems  provides  the  best  choice.  Again,  this  is  particularly 
true  in  high-order  and  high-speed  applications. 


State-space  models.  In  Roesser  local  s.s.  model  of  ^-operator  formulated  2-D  discrete¬ 
time  systems  takes  the  form 


’9ft[x'‘](^i)' 

.?v[x"](bi)_ 

= 

A(3)  .(4) 

L-ALg 

+ 

^  u  1 

+  [5,]u( 

bi); 

+  [r>g]u(^■,i), 


(T3.1) 


where  Aq  is  of  size  rih  x  n/j,  A^'q  ^  is  of  size  x  Uy,  etc.  Also,  and  denote  the 
horizontal  and  vertical  shift  operators,  that  is. 


=  x(z -1- l,i)  and  gv[x](2,i)  =  x(t,; -h  1).  (T3.2) 


To  exploit  the  advantages  of  5-operator  implementations,  analogous  to  the  1-D  case, 
we  define  the  operators 


<5/([x]0',i) 


x(?  +  l,i)  -  x(i,j)  ^  qh[^]iij) 

Aft  Aft  ’ 

x(i,i  +  1)  -x(z,j)  ^  g„[x](i,j)  -x(i,y) 

Av  A„ 


(T3.3) 


where  Ah  and  A^  are  two  positive  real  constants.  The  corresponding  5-operator  s.s.  model 
may  then  be  obtained  as 


■5/,[x^](i,i)' 

rA(^) 

A(2)1 

■5(1)  ■ 

_5„[x*'](i,i)_ 

_A(3) 

A(4) 

+ 

R(2) 

=  [^] 


+  [-5]u(i,;); 


=  [C'] 


x’^(^,i) 


+  P]u(i,j) 


+  [D]n(i,j). 


(T3.4) 
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This  is  the  2-D  version  of  the  intermediate  equation  mentioned  earlier.  In  addition,  as  for 
the  1-D  case,  we  have  the  following  update  equations: 

=  x'‘(i,i)  +  Ah  ■  Sh[x^]ii,j); 

9v[x"](i,i)  =  x''(i,  j)  +  A„  • 


(T3.5) 


Note  that. 


Ag  =  I  +  A  •  As  As  =  A  ^  •  {Aq  —  In)] 

Bq  =  A  •  B  Bs  =  A~^  ■  Bq] 

Cg  =  Cs  ^  Cs  =  Cq] 

Dq  =  Ds  Ds  =  Dq. 

Here,  A  —  ©  At,/„„]  is  of  size  (n/,  +  n„)  x  (uh  +  riy). 

The  associated  system  theoretic  notions,  such  as,  transition  matrix,  transfer  function, 
characteristic  equation,  etc.,  have  also  been  introduced.  This  s.s.  model  is  the  basis  for 
designing  2-D  filters  with  superior  finite  wordlength  properties.  The  design  procedures 
developed  are  expected  to  be  extremely  useful  in  obtaining  high-Q  2-D  emd  m-D  digital 
filters  that  axe  suitable  for  high-speed  applications. 

Stability.  In  the  1-D  case,  it  has  been  shown  that,  direct  techniques  with  no  recourse 
to  transformations  (that  first  converts  a  given  (^-system  to  its  ^-system  counterpart)  can 
provide  numerically  more  reliable  stability  checking  algorithms.  With  this  in  mind,  for  the 
2-D  case,  a  direct  stability  checking  technique  applicable  to  the  corresponding  ^-system 
transfer  function  has  been  introduced.  For  this  purpose,  a  recently  developed  tabular  form 
was  extended  to  the  complex  coefficient  case  and  the  notion  of  Schur-Cohn  minors  was 
introduced  to  the  ^-operator  case. 

Gramians  and  balanced  realization.  The  notions  of  reachability  and  observability 
gramians  and  balanced  realization  have  been  introduced  for  the  ^-operator  case.  In  order 
to  do  this,  first,  the  relationship  between  the  gramians  for  the  6-  and  ^-operator  cases,  as 
defined  in  the  literature,  was  established.  The  reachability  and  controllability  gramians, 
that  is,  P  and  Q,  respectively,  for  1-D  ^-systems  were  found  to  satisfy 

Jts 


(T3.6) 


JTt 

Q=lhi  (c‘I  -  Air'ClCiicI  -  At)-'  - 

27’’;  J-Ts  1 


1  +  Ac’ 
dc 


(T3.7) 


'Ti  1-1- Ac’ 

where  Ts  is  the  stability  boundary  applicable  for  ^-systems,  that  is,  Tg  =  {c  €  Sx  :  |c  -f 
1/A|  =  1/A}.  An  extension  of  this  is  then  used  to  define  the  2-D  gramians  of  (5-systems 
represented  in  the  Roesser  model  developed  above. 
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For  the  important  class  of  separable  (that  is,  separable-in-denominator)  systems,  it 
is  shown  that  these  gramians  may  be  computed  through  the  solution  of  four  Lyapunov 
equations.  These  notions  and  results  are  useful  in  many  applications,  such  as,  in  extracting 
reduced  order  models  of  ^-systems. 

Sensitivity.  Measures  that  indicate  coefficient  sensitivity  of  the  ^-models  developed 
above  have  been  introduced.  Unlike  what  is  available  in  literature,  this  development  is 
applicable  to  the  MIMO  case  as  well.  With  these  sensitivity  measures  as  a  guide,  devel¬ 
opment  of  minimum  sensitivity  structures  has  been  carried  out.  The  connection  with  the 
corresponding  balanced  realizations  has  been  pointed  out. 

Roundoff  noise.  With  the  use  of  a  noise  model  that  takes  into  account  the  roundoff 
error  propagation  in  the  s.s.  model  developed  above,  structures  that  minimize  roundoff 
noise  have  been  developed. 
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SUMMARY  OF  PHASE  P2  RESULTS 

The  work  described  in  this  report  is  related  to  the  following 
[T2]  Task  T2:  Analysis  of  nonlinear  circuits  through  ^-operator  based  schemes. 
Problems  Posed  in  Task  T2; 

Regarding  the  proposals  associated  with  the  above  grants,  within  Task  T2,  the  following 
questions  were  raised: 

1.  With  the  desirable  properties  of  ^-systems  applicable  to  linear  systems  in  mind,  does 
the  same  carry  over  if  nonlinear  systems  are  implemented  with  ^-operator  based 
schemes? 

2.  In  particular,  issues  concerning  coefficient  sensitivity  and  quantization  noise  is  of  spe¬ 
cial  importance  in  such  systems. 

3.  If  a  (5-operator  based  scheme  offers  significant  improvements  over  its  ^-operator  coun¬ 
terpart,  the  consequences  in  nonlinear  signal  processing,  nonlinear  control,  and  digital 
simulation  of  nonlinear  dynamics  can  be  significant. 

In  fact,  the  superior  finite  wordlength  performance  of  the  discrete  simulation  of  Chua’s 
Circuit  in  the  grant  proposals  using  the  ^-operator,  instead  of  the  more  conventional  q- 
operator,  provided  the  impetus  for  the  work  proposed  in  Task  T2.  The  work  described 
herein  justifies  our  preliminary  optimism  and  show  that  this  superior  performance  can  be 
expected  with  ^-operator  based  implementations. 
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This  task  was  proposed  to  be  carried  out  during  Phase  P2  with  close  collaboration 
between  the  two  PPs.  During  the  whole  project  duration,  both  PFs  have  been  in  con¬ 
stant  contact.  In  particular,  a  considerable  portion  of  the  work  described  herein  was  seen 
to  maturity  during  a  one-week  research  stay  at  University  of  Notre  Dame  during  August 
09-16,  1994.  Diming  this  time,  important  results  that  address  coefficient  sensitivity  and 
quantization  error  bounds  applicable  to  ^-operator  based  implementation  of  nonlinear  sys¬ 
tems  were  developed.  A  description  of  those  Phase  P2  results  pertaining  to  coefficient 
sensitivity  follows. 

Task  T2;  Results  Pertaining  to  Coefficient  Sensitivity — Summary 

Briefly,  conclusions  drawn  from  this  work  may  be  summarized  as  follows:  We  have  investi¬ 
gated  orbits  of  linear  and  nonlinear  systems.  Several  important  types  of  nonlinearities — 
nonlinearities,  piecewise  nonlinearities,  and  piecewise  linear — were  looked  into. 

•  The  Fixed-Point  Arithmetic  (FXP)  Case: 

With  small  step  size,  6-systems  provide  superior  coefficient  sensitivity  performance. 

•  The  Floating-Point  Arithmetic  (FLP)  Case: 

Conditions  under  which  6-systems  provide  superior  coefficient  sensitivity  were  derived. 
Typical  digital  equivalents  of  nonlinear  systems  derived  for  simulation  purposes  in  fact 
routinely  satisfy  these  conditions  when  the  step  size  is  small. 
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Task  T2:  Results  Pertaining  to  Coefficient  Sensitivity — Brief  Description 
Consider  the  following  ^-operator  based  implementation  of  a  nonlinear  system: 

5[x](n)  =  f,(x(n),a,),  (1) 


where  ^[x]  =  x(n  +  1).  Here,  x{n)  is  the  state  x  €  3?'”  at  time  instant  n  and  a^  = 
[a^j , . . .  ^  refers  to  the  system  parameters  that  are  actually  stored  within  the 

computer. 


The  corresponding  ^-operator  based  scheme  of  the  same  nonlinear  system 


form 

5[x](n)  =  f5(x(n),a5)  (Intermediate  equation) 
5[x](n)  =  x(n)  +  A  •  ^[x](n)  (Update  equation) 
where  6[x](n)  =  (5[x](n)  —  x(n))/A  and 


(2) 


f6(x(n),ai) 


fg(x(n),ag)  -  x(n) 
A 


(3) 


Here,  A  G  3?  is  an  arbitrary  positive  real  parameter  and  a^  =  [ag, ,  •  -  • ,  ^  again 

refers  to  the  system  parameters  that  are  actually  stored  within  the  computer. 


To  see  the  relationship  between  a^  and  a^,  let  the  i-th  equation  in  (1)  be 


q[xi]{n)  =  fq,{xi{n),..  .,Xm(n),ag^,...,ag^),  e  =  l,...,m.  (4) 


Then,  we  may  encounter  one  of  two  situations: 


1.  There  is  a  linear  term  corresponding  to  a;,-,  that  is,  a  term  of  the  nature  a/c'X,(n),  on 
the  RHS  of  (4).  Then,  we  need  to  store 


aSi  = 


for  i  = 
for  i  =  K. 


(5) 


2.  There  is  no  linear  term  corresponding  to  a;,  on  the  RHS  of  (4).  Then,  we  need  to  store 
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asi  =  for  ?  =  {1, . . . ,  M},  and 


(6) 


Remark, 

1.  Of  course,  in  an  infinite  wordlength  implementation,  there  simply  is  no  difference 
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between  the  q~  and  6-operator  based  schemes  in  (1)  and  (2).  In  fact,  the  latter  requires 
a  modest  increase  in  the  number  of  computations.  However,  what  we  address  is  the 
performance  under  finite  wordlength  high-speed  conditions. 

2.  Discretization  of  a  nonlinear  system  of  the  form 

x<»(i)  =  f(x((),a)  (7) 

can  give  rise  to  equations  of  the  type  in  (1)  and  (2).  Here,  is  the  z-th  derivative  of  x. 

3.  In  what  follows,  f(x,a)  G  denotes  a  nonlinear  function  that  possesses  first  partial 
derivatives. 

Now,  which  of  the  schemes  (1)  or  (2)  yield  superior  coefficient  sensitivity  properties 
of  its  orbit  with  respect  to  perturbations  of  a,  or  a^,  respectively?  This  consideration  is 
crucial  in  high-speed  applications  where  a  shorter  wordlength  is  the  avenue  of  choice. 

In  what  follows,  the  following  standing  assumptions  are  made: 

1.  All  perturbations  are  small. 

2.  Comparison  between  q-  and  6-operator  based  implementations  are  done  with  respect 
to  upper  bounds  (constructed  through  appropriate  norms)  on  possible  errors  due  to 
coefficient  sensitivity. 


FXP  CASE 

In  the  FXP  case,  a  good  indication  of  the  coefficient  sensitivity  of  the  orbit  x  is  its  first 
partial  derivative  with  respect  to  the  stored  coefficient  vector  a,  that  is, 


(9a 


n 


da. 


x(n)  G 


(8) 


nonlinear  system 


q-operator  case 

For  this  case,  we  can  show  the  following 


Theorem  1.  For  the  ^-operator  based  implementation  in  (1), 

5x 


n  — 1 


da„ 


n+1 


=  E n  [it 


dXm 


] 


j=o  \  i=j+l 

where  Im  denotes  the  identity  matrix  of  size  M  x  M  and 


di. 

J  da, 

+ 

j  da. 

r  ^ 

L  dxi 


dfa 


dXr, 


dXm^^ 


fg(0]  e 


dfa 


da„ 


da„ 


fg(i)  6 


S-operator  case 

For  brevity,  we  only  consider  the  case  in  (5).  Note  that, 


dx 

das 


n 


(9) 


In  addition,  we  need  to  consider  the  sensitivity  of  the  orbit  with  respect  to  A  (due  to  the 
update  equation  in  (2)).  However,  if  we  assume  an  exact  FXP  representation  for  A,  this 
term  could  be  ignored. 


Using,  for  instance,  a  norm  to  compare  the  sensitivity  measures,  we  conclude  that,  the 
^-operator  based  implementation  will  provide  superior  coefficient  sensitivity  performance. 


Remark.  We  obtain  similar  results  when  considering  the  case  in  (6).  Here,  one  may  need 
to  consider  sensitivity  with  respect  to  1/A  as  well.  Again,  we  may  assume  that  1/A  (and 
A)  have  exact  FXP  representations.  Even  if  this  is  not  the  case,  ^-system  is  still  likely  to 
be  superior  since  the  reduction  in  sensitivity  gained  through  other  terms  is  A-fold. 
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II.  Linear  system 

The  superior  coefficient  sensitivity  of  the  frequency  response  of  (5-operator  baseci  systems 
is  thoroughly  investigated  in  Li  and  Gevers  (1990).  However,  no  result  exists  that  address 
the  coefficient  sensitivity  of  the  orbit. 


q-operaior  case 

With  the  more  general  result  in  Theorem  1,  we  can  show  the  following 


Theorem  2.  For  the  ^-operator  based  implementation  x(n  -f  1)  =  Agx(n),  Ag  6 

n— 1 


dx 


dA„ 


=  T{I^®A^-^)U 

mXm  (/^  Gx)  +  U 

mXm 

j=0  ^ 


where  Ug^p  =  ELi  E  •=!  ®  ^ 

€  3?*^  is  the  unit  vector  with  1  on  its  i-th  row  (Brewer  1978). 


e  Here, 


8-operator  case 

The  corresponding  ^-system’s  intermediate  equation  is  ^[x](n)  =  A5x(n)  where  As  = 
(Ag  —  /)/A.  The  update  equation  of  course  is  as  in  (2). 

Again,  as  in  Section  I,  we  can  show  that,  the  ^-operator  based  implementation  will 
provide  superior  coefficient  sensitivity  performance. 


III.  Piecewise  nonlinear  system 

Consider  a  nonlinearity  that  is  piecewise  and  possesses  first  partial  derivatives  within  each 
‘piece’.  To  address  its  coefficient  sensitivity,  we  model  the  dynamics  of  such  a  system  as 
follows: 


1.  Within  each  ‘piece’,  the  system  dynamics  is  a  nonlinearity. 

2.  Each  instant  of  the  orbit’s  ‘entry’  into  another  ‘piece’  is  modeled  as  a  perturbation  in 
the  initial  conditions. 


Regarding  item  1,  as  previous  results  indicate,  the  (5-operator  based  implementation 
will  be  superior  within  each  ‘piece’.  Regarding  item  2,  we  need  to  investigate  the  orbit’s 
coefficient  sensitivity  with  respect  to  initial  conditions.  This  is  addressed  now. 


q-operaior  case 

A  reasonable  sensitivity  measure  is 

dx 


dx{Q) 


n 


d 

c)x(O) 


x(7r)  e  3^’"'. 


(10) 
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Then,  we  can  show  the  following 

Theorem  3.  For  the  g-operator  based  implementation  in  (1), 

dx 


5x(0) 


n+l 


dx„ 


t 


j=0 


A 


S-operator  case 

One  may  show  that,  Theorem  3  is  equally  applicable  for  the  6-operator  case  as  well. 

Hence,  regarding  sensitivity  due  to  initial  conditions,  both  q-  and  6-operator  based 
implementations  are  expected  to  be  provide  comparable  results. 

This  implies  that,  in  totality,  6-operator  based  implementations  will  provide  superior 
results. 

IV.  Piecewise  linear  system 

Again,  we  address  the  coefficient  sensitivity  of  the  orbit  with  respect  to  the  initial  condi¬ 
tions. 


q-operator  case 

As  in  Section  II,  with  the  more  general  result  in  Theorem  3,  we  can  show  the  following 


Theorem  4.  For  the  ^-operator  based  implementation  of  x(n  -f-  1)  =  Aqx{n), 


dx 


dx{0) 


n-1-1 


A 


^  >LO 


6-operator  case 

Again,  one  may  show  that.  Theorem  4  is  equally  applicable  for  the  6-operator  Ccise  as  well. 


Hence,  as  in  Section  III,  6-operator  based  implementations  will  provide  superior  re¬ 
sults. 
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FLP  CASE 

In  the  FLP  case,  representable  values  are  spaced  farther  apart  at  higher  values  of  the  pa¬ 
rameter.  Hence,  instead  of  that  used  for  the  FXP  case  (see  (8)),  a  more  realistic  sensitivity 
measure  is  (see  Li  and  Gevers  (1990)) 


dx 

da./ a. 


n 


d 

dai/ai 


x(n) 


a 

da^j /aM 


x(n)J 


(11) 


I.  nonlinear  system 


q-operaior  case 

For  this  case,  we  can  show  the  following 


Theorem  5.  For  the  ^-operator  based  implementation  in  (1), 


dx 

daq / aq 


n-l 


Im  ® 


n 


dfq 

dx\ 


diq 

daq/aq 


+ 


dia 


daq  /  aq 


8 -operator  case 

Again,  we  only  consider  the  case  in  (5).  Also,  let  us  assume  that,  the  elements  in  a,  are 
enumerated  such  that,  for  each  z  =  1, . . . ,  m,  is  the  ‘linear’  element  of  the  i-th  equation. 
Then,  note  that. 


dx 


das /as 


^  ax 


as 


ax 


(«9i  -  1) 


ax 


da 


-  1) 


91 


ax 


aaq 

ax 


^^m  +  i  da 


9m  +  l 


ax 


daa 


(12) 


where  we  have  used  (5)  and  (9).  As  before,  we  may  ignore  the  effect  of  A. 


Again,  we  use  a  norm  to  compare  the  sensitivity  measures.  For  instance,  using  the  1- 
or  oo-norm,  we  conclude  that,  the  6-operator  based  implementation  will  provide  superior 
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coefficient  sensitivity  performance  if 

|a<,i  -  ll  <  la,!  I ,  Vi  =  1,. . .  ,m.  (13) 

But,  how  practical  is  this  restriction?  In  other  words,  how  often,  if  at  all,  is  it  satisfied 
in  practice?  To  address  this,  consider  the  following 

Example.  Lorenz  equation.  Consider  the  state-space  description  of  the  Lorenz  equation: 

-  ^2(1))', 

~  ^2{i)  -  xi{t)x3{ty, 

=  xi{t)x2{t)  -  I3x3{t). 

For  digital  simulation  of  the  corresponding  orbit,  we  use  the  forward  Euler  scheme  with 
step  size  A.  This  yields 

a;i(n  -f  1)  =  (1  -  Act)  xi{n)  -t-  Act  X2{n); 

X2{n  +  1)  =  Apxi(n)  -F  (1  -  A)x2(n)  -  Axi(n)x3(n); 

X3(n  H- 1)  =  (1  -  Al3)x3{n)  -h  Axi(n)x2(n). 

We  at  once  observe  the  following;  For  a  small  step  size  A, 

1.  Linear  terms  are  close  to  1. 

2.  Other  terms  are  very  small. 

Hence,  the  condition  in  (13)  is  in  fact  satisfied! 

In  fact,  when  digital  simulation  of  nonhnear  systems  are  carried  out,  (13)  is  often 
satisfied  for  a  small  step  size  (which  denotes  fast  sampling).  Hence,  we  conclude  that, 
a  ^-operator  based  implementation  of  such  a  simulation  will  provide  superior  coefficient 
sensitivity  performance!! 

II.  Linear  system 

Again,  no  result  that  addresses  coefficient  sensitivity  of  the  orbit  of  linear  systems  imple¬ 
mented  using  FLP  arithmetic  is  available. 

Without  delving  into  much  detail,  we  simply  state  the  relevant  result:  Consider  the 
^-operator  based  implementation  x(n  -|-  1)  =  A,x(n)  and  its  corresponding  ^-operator 
based  implementation.  With  respect  to  the  FLP  coefficient  sensitivity  measure  introduced 
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above,  the  coefficient  sensitivity  of  the  <5-system  is  superior  (in  terms  of  the  norm  being 
used)  than  that  for  the  corresponding  ^-system  if 

M,-^ll<ll^,l|.  (14) 

It  is  not  hard  to  show  the  following: 

lAifyl,]  -  1|  <  |Ai[Ag]|,  Vi  =  1,. . .  ,m  4=^  HA,  -  J||f  <  m,!!/:’;  (15a) 

lA^A,]  -  1|  <  |A,[A,]i,  Vz,i  =  1, . . .  ,m  ||A,  -  /||2  <  (15b) 

[diagjA,]  -  1|  <  |diag,[Aj|,  Vi  =  l,...,m  <^=4>  p,  -  <  Pglli.oo-  (15c) 

Here,  A,[Ag]  denotes  the  i-th  eigenvalue  of  Aq  and  diagjAg]  denotes  the  i-th  diagonal 
element  of  Aq. 

Hence,  if  any  one  of  the  above  conditions  are  satisfied,  the  ^-operator  based  imple¬ 
mentation  will  provide  superior  coefficient  sensitivity  performance. 

Remark.  Li  and  Gevers  (1993)  refers  to  the  region  in  condition  (15a)  as  the  Middleton- 
Goodwin  (MG)  Region.  They  have  shown  that,  if  (15a)  is  satisfied,  the  ^-operator  based 
implementation  will  provide  superior  coefficient  sensitivity  of  its  frequency  response. 

Regarding  systems  corresponding  to  those  in  Sections  III  and  IV  of  the  FXP  case,  ^-systems 
offer  similar  advantages. 

Example,  continued.  Lorenz  equation.  To  justify  and  validate  the  results  above,  a  digital 
simulation  of  the  Lorenz  equation  was  carried  out  using  both  q-  and  ^-operator  schemes 
with  FLP.  The  results  are  summarized  in  the  series  of  graphs. 

1.  Nominal  coefficient  values:  c  =  10;  p  =  28;  /9  =  8/3. 

2.  Initial  conditions:  [xi(0), 0:2(0), 0:3(0)]^  =  [0, 5, 75]’^. 

3.  Coefficient  representation:  FLP  arithmetic  with  the  number  of  bits  used  for  the  man¬ 
tissa  indicated  on  each  graph. 

4.  Integration  scheme:  Forward  Euler  with  step  size  A  =  le  —  03. 

5.  Number  of  time  steps  is  25,000. 

6.  Only  projection  onto  (.xj ,  ,T2)-plane  is  shown. 
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It  is  important  to  note  that,  when  only  4  bits  are  allowed  on  the  mantissa,  the  qualitative 
behavior  of  the  ^-system  is  completely  different  than  what  is  expected.  Hoever,  the 
system  still  provides  satisfactory  results.  Hence,  one  may  use  a  shorter  wordlength  for 
coefficient  representation  with  the  latter  without  affecting  performance.  The  implications 
on  speed,  number  of  components,  cost,  reliability,  etc.,  are  obvious. 
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HIGH  SPEED  FIXED-  AND  FLOATING-POINT 
IMPLEMENTATION  OF  DELTA- OPERATOR 
FORMULATED  DISCRETE-TIME  SYSTEMS 


Peter  H.  Bauer  and  Kamal  Premar  at  ne 


Focus:  Effects  of  Quantization  Errors  in  Nonlinear 
Q-  and  A- Operator  Systems 


Abstract 

Absolute  quantization  error  bounds  are  constructed 
for  q-  and  ^-operator  implementations  of  the  nonlin¬ 
ear  system  Xn+i  =  /(^n)*  Various  assumptions  on  the 
type  of  the  nonlinearity  /(•)  are  made  and  both  fixed 
and  floating  point  formats  are  investigated.  A  compar¬ 
ison  between  the  advantages  and  disadvantages  of  the 
two  implementation  schemes  is  introduced.  Finally,  an 
outlook  concerning  future  work  is  given. 


I.  Absolute  Bounds  on  Quzintization  Errors 

1.1.  Nonlinearities  of  the  Polynomial  Type 

1.1.1.  Fixed  Point  Case 


Q- operator  case: 

•  System  description: 


where 


•  •  •  ,xm)  = 


j  = 


fi{xi,  •  • '  ,xm)  \ 


) 

T  -  ■  T  •  x'} 
n=o  tM=o 


Xm 


•  Assumptions: 

-  single  precision  (i.e.,  single  length  accumulators) 

-  quantization  step:  q 


•  Computed  Orbit: 
x(n) 


Error  model  for  the  computed  orbit: 

/;(ii , ■  •  • , iw)  =  E  ■  •  • , E  £{■■■  H 

*1=0  *M=0 


where 


Hj  —  5  +5  - 


(2) 


iM-Nj) 


and 


I  <  kq  (truncation) 

1 4?^  I  <  (rounding) 

If  the  nonlinearity  /j(-)  is  known,  the  number 
of  nonzero  terms  in  the  polynomial  is  known 
and  therefore  the  number  of  terms  in  the 
summations  of  e-terms.  Hence  an  absolute 
bound  on  fXj  can  be  constructed: 

\f^j\  <  Cjq  (truncation) 


where  Cj  =  li  -{-  2I2  - h  M  •  NjIm-Nj 

l^^i/  =  1,'  •  •  ,MNj  being  the  number  of  terms 
present  in  the  summation  Ei  ej^^ 


8 -operator  case: 


•  system  description: 


8x(ri) 

6xj(n) 


f  (^n  ) 

A 

Nj  Nj  jj)  .  . 

i^=0  ^ 

y  ...  y  .  xV  ■ 

*1=0  *M=0 


,  x(n  +  1)  =  x{n)  +  A<5[a:(n)]. 


iM 


A 


yM 

Xm 


where 


,0)  . 


.  -1 
n-«M _ 


n-«A/ 


for  {ii’)  *  •  *  j 'iiVf)  s.th. 
ij  =  1,  and  =  0  for 
1/  =  1,  •  •  •  ,M,  1/ 

otherwise  . 


•  Assumptions  (same  as  in  g-operator  case): 

-  single  precision 

-  quantization  step:  q 


•  Computed  Orbit: 


Error  model  for  the  computed  orbit; 


5[ij]  =  ,E  •  •  • .  E.  . 


where 


11=0  iA/=0 


4'’=?4i’+EM}?+---+EA‘};' 


(2) 


(MNj) 


and 


lyLijfl  <  A;  ♦  g  (truncation) 
l4i’l  ^  ^  •  5  (rounding) 

•  Upper  bound  on 

IfJ'Pl  <  {Oj  +  1)  *  5  worst  case 
where  Cj  is  defined  as  in  the  g-operator  case. 

•  Error  in  the  computation  of  the  next  state: 


Xj{n  +  1) 


=  xj{n)  +  A5[5,(n)]  =  E  •  •  •  E 

n=o  tM=o 

+  A  ■ 


where 


Ima](^)1  <  Q  truncation 

^  2  rounding 


Comparison:  6-  vs.  q-operator: 


Error  term  bound  for  the  g-operator: 

|M;|  ^  Cj  •  q  (truncation) 

Error  term  bound  for  the  ^-operator: 

<  {Cj  +  l)qA+q  =  g([Cy  +  l]A+l)  (truncation) 

•  (5-operator  formulation  has  a  smaller  absolute 
error  bound  for: 

Ci  -  1 

Q  ^  >  A,  for  J  =  1,  ■  •  • ,  M 

Usually  Cj  »  1  and  hence  for 

1  -  €  >  A, 

the  <5-operator  system  is  preferable. 

(For  high  speed  systems  we  typically  have  A  <<  1.) 


.♦  \ 


Reasonable  choice  of  A  (from  an  error  bound  per¬ 
spective): 


Remarks: 

•  ^-operator  implementations  in  FXP  format 
seem  to  produce  a  significantly  smaller 
bound  than  ^-operator  implementations,  if 
A  <<  1. 

•  A  (5-operator  implementation  requires  a 
larger  dynamic  range  than  the  ^-operator 
implementation,  if  A  <<  1.  the  chance  of 
overflow  increases. 

•  To  avoid  overflow  problems,  the  (5-operator 
system  needs  to  be  implemented  with  a 
larger  wordlength. 
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Forced  Response  Case: 

System  description  for  the  g-operator: 

f(^S.njlLn)  ~  • 

^  /m(^15  *  •  *  •  •  •  )^ir)  ) 

where  the  /^’s  cire  again  multivariate  polynomials  in 
up  to  M  K  variables. 

System  description  for  the  (5-operator; 


^[^n]  = 


f[Xn)  -  X, 


^n+1  —  "i” 


•  Using  a  similar  error  model  as  in  the 
zero-input  case,  the  computation  of  r„+i  in 
the  g-operator  case  and  the  computation  of 
6xn  in  the  ^-operator  case  produce  bounds  of 
similar  magnitude. 

•  If  A  <<  1,  the  ^-operator  system  again  has  an 
advantage  over  the  g-operator  system,  since 
the  errors  of  the  first  equation  are  much 
larger  than  the  ones  produced  in  the  update 
equation. 
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1.1.2.  The  Floating  Point  Case 
Q-Operator  model  —  ideal  case: 


a:j(n  + 1)  =  E  •  •  •  E  a:i*  •  •  •  (1) 

n  *Af 

^-operator  model  —  ideal  case: 

6x^{n)  =  E  •  •  •  E  5  (2a) 

Xj[n  +  1)  =  Xj{n)  +  A5[xj{n)]  (2b) 

Model  for  floating  point  errors  due  to  multiplication 
and  addition: 


xQy  =  xy{l  4-  e) 

x®y  =  a;(l  +  €)  4-  y(l  +  €2) 

Consider  two  cases: 

(^)  1  with  zV  =  0  for  z/  =  1,  •  •  • ,  M,  1/  ^  i 

and  ij  =  1. 

*,*^1  <<  1  fbi*  3^1  other  combinations  of  (zi,  •  •  • ,  zjvf). 
x{n)€[-l,+l]^ 

(b)  condition  (a)  is  not  satisfled. 


Case  (a): 


^-operator  bounds  on  quantization  error  are  much 
smaller  than  g-operator  bounds. 

Qualitative  explanation: 

For  case  (a),  all  partial  sums  and  products  in 
the  computation  of  A  •  S[xj{n)]  are  much 
smaller  than  Xj{n).  Therefore  the  errors  in 
the  computation  of  A^[a:j(n)]  are  smaller 
compared  to  the  final  addition  error  in  (2b). 
Therefore  the  ^-operator  model  implicitely 
performs  operand  sorting,  which  is  known  to 
reduce  quantization  errors  in  floating  point 
arithmetic. 

Case  (b): 

^-operator  error  bounds  are  slightly  larger 
than  g-operator  bounds. 


Other  classes  of  nonlinear  systems  exist  which  also 
perform  better  using  a  ^-operator  formulation.  One 
such  class  is  the  weakly  nonlineeir  functions  satisfying: 

fj{xu--,XM)  =  Xj  +  ej{xi,‘-XM) 

with  \€j{xij"  ‘  ,xm)\  «  \xj\ 

j  =  1,  •  •  •  ,M. 

(If/i  (xi,  •  •  • ,  xm)  is  of  polynomial  type,  the  system  has 
to  operate  on  a  hypercuboid  or  another  finite  subspace 
of  3^^  since  polynomials  cannot  be  weakly  nonlinear  in 
the  above  sense  for  all  Xi€^,) 


Note: 

If  equations  (1)  or  (2)  arise  from  quantizing  a  con¬ 
tinuous  time  system  with  a  very  short  sampling  time, 
then  the  condition 

\xj{n)\  »  |A^xy(n)| 

can  be  satisfied  giving  the  ^-operator  formulation  an 
advantage  over  the  g-operator. 


1.1.3.  A  Generalized  Delta- Operator  Model  for  Linear 
and  Nonlinear  Systems 

Linear  Case: 

Assume  the  system  is  given  by: 

x{n  +  1)  =  Aqx(n)  +  Bqu{n) 

Consider  the  modified  A-operator  form: 

5[x(n)]  =  + 

x{n  4- 1)  =  Aox(n)  +  BQu[n)  +  A^[a:(n)] 

with  Aq  and  jBq  being  integer  matrices  closest  to  Aq 
and  Bq  respectively. 

Advantages 

•  The  dynamic  range  of  the  A^  and  B^ 
matrices  becomes  smaller  and  the  chance  of 
overfiow  is  reduced. 

•  The  delta  operator  realization  has  the  same 
improved  sensitivity  as  in  the  regular 
delta- operator  case. 

•  In  fioating  point  arithmetic,  the  condition 
Aq  c:=L  I  is  not  necessary  for  improved  error 
behavior  of  the  delta-operator  system. 


Nonlinear  Case: 


A  similar  argument  as  in  the  linear  case  can  be  made 
for  weakly  nonlinear  systems  of  the  form: 

x{n  +  1)  =  Aqx{n)  +  Bqu(n)  +  €{x{n),u(n)) 

where 

II  ||«||  x{n)  || 

and  II  £.{x{n),u{n))  ||«||  u{n)  || 
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1.2.  Nonlinearities  of  Piecewise  Linear  Form 


1.2.1.  The  Fixed  Point  Case 

Although  a  piecewise  linear  continuous  scalar  func¬ 
tion  can  be  represented  as 

/(a:)  =  ma;-Aii|a,)+6, 

i 

a  computationally  more  efficient  realization  is: 


f{x)  =  CiX  +  di  for  Xi<x<  Xi  (1) 

Therefore,  the  resulting  system  Xn+i  =  /(^n)  can  be 
written  in  form  of  several  linear  state  space  equations 
with  a  driving  term,  and  the  driving  terms  being  known 
a  priori: 

6  —  operator: 

S[Xn\  =  Afxr,  +  wf 

^n+1  ~  A<5[^72],  i  =  1,  •  •  *  ,  K 

q  —  operator: 

A^Xfi  -j-  Ui )  ^  ~  Ij  *  *  ‘  ^ 


Conclusion: 

•  For  single  precision  (quantization  after 
products)  the  absolute  error  bounds  for  the 
(5-operator  realization  are  smaller  than  for 
the  g-operator  realization. 

•  For  double  precision  (quantization  after 
summation)  the  absolute  error  bounds  for 
the  ^-operator  realization  are  approximately 
the  same  as  for  the  g-operator. 


1.2.2.  The  Floating  Point  Case 


Due  to  (1),  the  system  can  be  modeled  as  a  time- 
variant  linear  system  with  a  known,  piecewise  constant 
input.  Therefore  the  same  conclusions  apply  as  in  the 
linecLT  t.i.v.  case  with  regard  to  absolute  error  bounds: 

♦  Generally,  absolute  error  bounds  of  the  S~ 
and  ^-operator  system  are  of  similar  size. 

•  If  the  resulting  A-matrices  of  the  piecewise 
linear  system  are  all  ‘close’  to  the  identity 
matrix  I,  then  the  5-operator  system  will 
perform  superior  to  the  g-operator.  (see 
comments  in  1. 1.2.).  This  requires  that  the 
driving  terms  are  also  small  relative  to  the 
states. 


1.3.  Sector  Bounded  Nonlinear  Functions 

1.3.1.  The  Fixed  Point  Case 
System  description: 


X{(jl  “1“  l)  “h  *  *  *  “1“ 

i  =  1,  •  •  •  ,m. 

Sector  conditions  on  ^[  ]: 

kijCC ^  k^j  ^ 

If  €ij{n)  is  the  error  affiliated  with  the  computation 
of  the  response  of  the  q  and  ^-operator  system 

can  be  absolutely  bounded  by  the  following  majorant 
system: 

q-ojperator: 


m 


m 


xf{n  +  1)  =  E  rnfjxf{n)  +  £  efj{n),  i  =  1, 

i=i  j—i 


m 


=  max{\kijaij\,\kijaij\} 


where 


6 -operator 


xt{n  +  1)  =  E  rntjxf{n)  +  A  ei(n)  +  E  4j{n)  +  4p(”). 

i=i  \  i=i  / 

2  =  1,  •  •  •  ,m 

where 

€jp(n)  =  |€up(n)|,  €up{n)  :  error  occurring  in  update  equation 
ej(n)  =  |eA(n)|,  €A(n)  :  error  due  to  division  by  A. 

Comparison: 

The  bound  for  the  ^-operator  implementation  is  lower 
if 


•  Since  the  bound  for  |6+(n)|  is  typically  much 
larger  than  for  |€A(n)|  or  |€jp(n)|,  it  is  obvious 
that  for  A  <<  1,  the  above  condition  is  always 
satisfied. 

•  A  similar  result  holds  if  the  nonlinearities  !F 
enter  the  system  in  a  different  form,  i.e.,  if 
they  have  arguments  which  consist  of  partial 
sums. 

•  A  similar  comparison  arises  for  other  fixed 
point  quantization  formats. 


1.3.2.  The  Floating  Point  Case 


Generally,  the  (^-operator  implementation  is  not  su¬ 
perior  to  the  g-operator  implementation  if  one  com¬ 
pares  absolute  error  bounds.  However,  as  stated  be¬ 
fore  (1.1.2,  1.2.2),  if  the  condition 

ki(n)|  >>  |A(5(a:i(n))| 

holds  true  for  all  states  (i  =  1,  •  •  • ,  m),  then  the  ^-operator 
implementation  has  a  significantly  smaller  error  bound. 

A  class  of  systems  satisfying  the  above  condition  is 
given  by; 


Xi{n  +  1)  =  Tii[aiixi{n)]  + - Tim[a^m^m{n)] 

i  =  1,  •  •  • ,  m 


where 


kij  G  for  ^  ^ 

kii  G  [1  —  1  +  e^i]  otherwise, 

l^ijl  «  1  for  ijj  =  1,  •  •  •  ,m. 

Again,  such  a  system  could  arise  from  a  continuous 
time  system  with  a  high  sampling  rate. 


II.  Comparison  of  Implementations 


FXP-case 

FLP-case 

BFLP-case“ 

general  system:  q- 
error  bounds 

5-operator  is  mostly 
superior 

5  and  g-operator  are 
comparable 

similar  to  FXP  case? 

g-error  bounds  for  a 
short  sampling  time 
in  the  discretization 
process 

5-operator  is  mostly 
superior 

5  operator  is  superior 

similar  to  FXP  case? 

limit  cycles  (linear 
case  only) 

5-operator  produces 
incorrect  equilibria 

limit  cycles  in  under¬ 
flow  for  both  q  and  5- 
operator 

similar  to  FLP  case 

hardware  require¬ 

ments  for  small  A 

5-operator  requires 
longer  registers  than 
g-operators 

independent  of  A 

overflow  effects 

5-operator  is 

more  likely  to  cause 
overflow 

in  both  operators 
unlikely 

similar  to  FLP  case 

general  sensitivity 

S  operator  superior 

5-operator 

better  than  or  equal 
to  ^-operator 

similar  to  FXP  case? 

sensitivity  for  a  short 
sampling  time  in  dis¬ 
cretization  process 

5-operator  superior 

5  operator  superior 

similar  to  FXP  case? 

*  hats  not  been  analyzed  in  detail  yet,  expected  results. 
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PRELIMINARIES 


OPERATORS 

•  For  X  E  3?”^,  q[']  is  the  operator 

5[x](n)  =  x(n  +  1). 

•  For  X  E  3^”^,  <5[-]  is  the  operator 

S[x\{n)  =  +  ^  g[x](n)-x(n) 

Here,  A  is  a  positive  constant  (usually  the  sampling  time). 

•  g[']  and  6[-]  are  related  by 

q  =  1  A6. 

q-OPERATOR  BASED  STATE-SPACE  MODEL 

^-operator  based  model  {Aq,  Bq,Cq,  Dq}  of  a  linear,  shift-invariant, 

causal,  p-input,  g-output  discrete-time  system: 

g[x](n)  =  Aqx{n)  -f-  Bqu{n); 
y(n)  =  Cqx(n)  -b  Dqu{n). 


(5-OPERATOR  BASED  STATE-SPACE  MODEL 
Corresponding  6-operator  based  model  {As,Bs,Cs,Ds}: 
Intermediate  equation 


6[x](n)  =  Asx{n)  R^u(n); 
y(n)  =  C6x(n)  +  Dsu{n). 


Uvdate  epuation 


g[x](n)  =  x(n)  -|-  A  •  6[x](n), 

•  {Aq,Bq,Cq,Dq}  and  {As,  Bs^Cs,  Ds}  are  related  by 


Aq  —  /  -f-  AAs',  Bq  =  ABs',  Cq  =  Cs‘, 


1 


[Tl]  LINEAR,  SHIFT-INVARIANT  DISCRETE-TIME  SYSTEMS 


^OBJECTIVE 

•  How  do  ^-systems  perform  under  fast  sampling/short  wordlength 
conditions?  What  are  their  properties  regarding  limit  cycles, 
quantization  errors,  coefficient  sensitivity,  and  dynamic  range? 

^ACCOMPLISHMENTS 

Both  FXP  (fixed-point)  and  FLP  (floating-point)  schemes  are  tackled. 
*LIMIT  CYCLES 

•  FXP  case:  6-systems  (with  small  A)  always  exhibit  limit  cycles. 

•  FLP  case:  Similar  to  g-systems,  with  sufficient  mantissa  length, 
limit  cycles  occur  only  in  underflow. 

^QUANTIZATION  ERROR  PROPAGATION 

•  FXP  case:  6-systems  possess  smaller  bounds  for  quantization 
after  multiplication.  Otherwise,  both  q-  and  6-systems  are  com¬ 
parable. 

•  FLP  case:  In  general,  6-systems  are  better  than  or  equal  to  q- 
systems.  If  6-system  is  the  digital  equivalent  of  a  continuous-time 
system  with  fast  sampling,  it  offers  superior  performance. 

^COEFFICIENT  SENSITIVITY 

•  FXP  case:  6-systems  are  superior  with  fast  sampling. 

•  FLP  case:  In  general,  6-systems  are  better  than  or  equal  to  q- 
systems.  If  6-system  is  the  digital  equivalent  of  a  continuous-time 
system  with  fast  sampling,  it  offers  superior  performance. 

^DYNAMIC  RANGE  CONSTRAINTS 

•  FXP  case:  If  6-system  is  the  digital  equivalent  of  a  continuous¬ 
time  system,  both  q-  and  6-systems  are  comparable.  If  a  ^-system 
is  simply  converted  to  a  6-system,  the  latter  requires  a  larger 
dynamic  range. 

•  FLP  case:  If  6-system  is  the  digital  equivalent  of  a  continuous¬ 
time  system,  it  is  superior.  If  a  g-system  is  simply  converted  to 
a  6-system,  the  latter  requires  a  slightly  larger  dynamic  range. 


LIMIT  CYCLES 


The  ideal  linear  system  is  taken  to  be  asymptotically  stable.  We 

consider  the  zero  input  case. 

FXP  Case 

A  6-system  implementation,  under  finite  wordlength,  becomes 

6[x](n)  =  Q{A5x(n)}; 
g[x](n)  =  x(n)  -f-  (^{A  •  6[x](n)}. 

Here,  <5{'}  is  the  quantization  nonlinearity. 

Accomylishments 

•  6-systems  exhibit  DC  limit  cycles  if 

A  <  0.5  for  rounding;  A  <  1.0  for  truncation. 

Fundamental  reason  for  these  limit  cycles  is  the  deadzone  of  quan¬ 
tizer.  This  creates  deadbands  for  both  6[x]  and  x. 

•  In  fact,  limit  cycle  free  6-system  implementations  do  not  exist! 

•  A  smaller  sampling  time  A  yields  a  larger  deadband  for  6[x]. 

•  Construction  of  this  deadband  for  various  arithmetic  schemes 
have  been  performed. 

•  Structure  of  system  matrix  As  has  a  major  effect  on  geometry  of 
deadband  for  x. 

•  Reduction  of  quantizer  deadzone  reduces  size  of  deadband,  thus 
reducing  DC  limit  cycle  amplitude.  But,  this  increases  other 
(oscillatory)  limit  cycles. 

•  Neither  the  use  of  unconventional  quantization  nonlinearities  nor 
scaling  techniques  overcome  this  difficulty. 

FLP  Case 

Accomylishments 

•  If  mantissa  length  is  sufficiently  large,  response  will  always  con¬ 
verge  into  underflow. 

•  Hence  limit  cycles  may  occur  only  in  underflow.  This  is  usu¬ 
ally  acceptable  if  dynamic  range  of  underflow  is  sufficiently  small 
(that  is,  smallest  representable  exponent  is  sufficiently  small). 


QUANTIZATION  ERROR  PROPAGATION 

Quantization  error  propagation  is  investigated  via  error  envelopes. 


FXP  Case 

Accomplishments 

•  Error  envelopes  for  6-systems  are  lower  than  for  corresponding 
g-systems  if  quantization  occurs  after  multiplication.  Otherwise, 
they  are  comparable. 


FLP  Case 

Accomplishments 

•  In  general,  error  envelopes  6-systems  are  better  than  or  equal  to 
g-systems. 

•  However,  when  g-system  matrix  Aq  is  of  the  form 

Aq  =  I  -\-  {eij}, 

where  the  matrix  elements  €ij  satisfy 

I  i«  1. 

6-system  provides  superior  performance.  This  situation  occurs, 
when  a  digital  equivalent  of  a  continuous-time  system  is  obtained 
with  fast  sampling. 

•  In  this  situation,  6-operator  implementation  achieves  ‘operand 
sorting’  (which  is  known  to  tremendously  reduce  quantization 
errors  in  FLP  realizations). 

•  Generalized  versions  of  6-operator,  that  can  tackle  situations 
where  Aq  does  not  satisfy  the  above  condition,  have  been  de¬ 
veloped.  These  provide  superior  performance  than  g-systems. 


COEFFICIENT  SENSITIVITY 

Coefficient  sensitivity  is  investigated  via  differential  sensitivity  mea¬ 
sures.  Small  perturbations  are  assumed. 

•  Frequency  response  sensitivity  have  been  investigated  by  others. 

•  Time  response  or  orbit  sensitivity  arises  as  a  special  case  of  our 
work  in  Task  [T2]  below. 

EXP  Case 

Accomplishments 

•  6-systems  offer  superior  performance,  in  particular,  with  fast 
sampling. 

FLP  Case 

Accomylishments 

•  In  general,  6-systems  are  better  than  or  equal  to  the  correspond¬ 
ing  g-sy  stems. 

•  Conditions  under  which  6-systems  perform  better  are  derived.  In 
particular,  if  the  6-system  is  a  digital  equivalent  of  a  continuous¬ 
time  system  obtained  with  fast  sampling,  it  offers  superior  per¬ 
formance. 

DYNAMIC  RANGE  CONSTRAINTS 

FXP  Case 

Accomylishments 

•  If  the  6-system  is  obtained  by  discretization  of  a  continuous-time 
system,  the  dynamic  range  requirements  of  corresponding  q-  and 
6-systems  are  comparable. 

•  If  the  6-system  is  obtained  by  simply  converting  a  g-system,  it 
typically  requires  a  larger  dynamic  range,  larger  coefficient  reg¬ 
isters,  and  larger  accumulators. 

FLP  Case 

Accomylishments 

•  Wordlength  requirements  for  g-  and  6-systems  are  comparable. 

•  If  the  6-system  is  obtained  by  discretization  of  a  continuous-time 
system  with  fast  sampling,  its  zero  convergence  can  be  guaranteed 
with  less  number  of  bits. 


[T2]  DIGITAL  SIMULATION  OF  NONLINEAR  SYSTEMS 


^OBJECTIVE 

•  Can  one  perform  reliable  digital  simulations  of  nonlinear  systems 
using  6-operator  based  numerical  schemes? 

•  If  so,  just  as  for  linear  systems,  would  one  get  superior  finite 
wordlength  properties? 

•  The  resulting  impact  and  consequences  in  high  performance  com¬ 
puting  (for  example,  in  digital  simulation  of  nonlinear  systems, 
signal  processing,  and  control)  can  be  significant. 


^ACCOMPLISHMENTS 

Several  important  types  of  nonlinearities  were  considered. 

*LIMIT  CYCLES 

This  is  quite  similar  to  the  linear  case.  See  our  work  in  Task  [Tl]. 

^QUANTIZATION  ERROR  PROPAGATION 

•  FXP  case:  Due  to  possibility  of  incorrect  equilibria,  FXP  imple¬ 
mentation  is  not  recommended. 

•  FLP  case:  Conditions  under  which  6-systems  are  superior  are 
derived. 

^COEFFICIENT  SENSITIVITY 

•  FXP  case:  With  small  grid  size,  6-operator  based  numerical 
schemes  are  superior  than  the  conventional  ^-operator  schemes. 

•  FLP  case:  Conditions  under  which  coefficient  sensitivity  of  6- 
systems  are  superior  are  derived.  Typical  digital  equivalents  of 
nonlinear  systems  under  small  grid  size  routinely  satisfy  these 
conditions. 

^DYNAMIC  RANGE  CONSTRAINTS 

This  is  quite  similar  to  the  linear  case.  See  our  work  in  Task  [Tl]. 


q-OPERATOR  BASED  NONLINEAR  SYSTEM 


5[x](n)  =  fq(x(n),a<^). 

•  SLq  =  are  the  coefficients  that  are  actually  stored 

in  computer. 

(5-OPERATOR  BASED  NONLINEAR  SYSTEM 
We  propose  the  following: 

Intermediate  equation 


<5[x](n)  =  f^(x(n),a^). 


Uydate  epuation 


g[x](n)  =  x(n)  -f  A  •  (5[x](n). 

•  <5[x](n)  =  (g[x](n)  —  x(n))/A  an(d  =  (fg  —  x)/A. 

•  A  is  an  arbitrary  positive  constant  (usually  the  grid  size). 

•  a^  =  . . . ,  coefficients  that  are  actually  stored. 

QUANTIZATION  ERROR  PROPAGATION 

FXP  Case 

Accomvlishments 

•  (5-systems  offer  superior  performance  if  quantization  is  performed 
after  multiplication  or  if  polynomial  nonlinearities  of  higher  order 
are  to  be  implemented. 

•  However,  in  FXP,  (5-systems  may  converge  to  incorrect  equilib¬ 
ria  (see  comments  in  [Tl]).  Hence,  FXP  implementation  is  not 
recommended. 

FLP  Case 

Accomvlishments 

•  6-systems  show  significantly  reduced  quantization  error  bounds 
if  6[x](n)  =  f6(x(n))  where  ||A  •  f^(x(n))||  <<  ||x(n)||. 

•  Under  fast  sampling,  similar  to  the  linear  case,  this  condition 
is  routinely  satsified.  Hence,  6-operator  based  discretization 
schemes,  in  FLP,  can  drastically  reduce  quantizations  errors  with 
fast  sampling. 


COEFFICIENT  SENSITIVITY 

For  this  presentation,  the  nonlinearity  is  taken  to  belong  to  ,  that  is, 
it  possesses  first  partial  derivatives.  Small  perturbations  are  assumed. 


FXP  Case 

Coefficient  perturbation  is  approximately  independent  of  its  nominal 
value.  Hence,  a  good  sensitivity  measure  of  orbit  x  is  d'x./da.\n  = 
dx{n)lda.. 

Accomplishments 

•  Comparison  of  q-  and  6-systems:  dx/dat.q\n  =  A  •  dx/da.^\n- 
Hence,  6-operator  based  schemes  offer  superior  coefficient  sen¬ 
sitivity  when  A  is  small. 

•  Similar  comments  hold  true  for  linear  systems,  piecewise  non¬ 
linear  systems,  and  piecewise  linear  systems. 


FLP  Case 

Coefficient  perturbation  is  approximately  proportional  to  its  nominal 
value.  Hence,  a  good  sensitivity  measure  of  orbit  x  is 


dx  I 
5a/a 


d 


da\ fa\ 

d 


x(n) 


L  a — ~/ — J 

L  daM/o-M  ''  ^ 


Accomylishments 

•  Comparison  of  q-  and  6-systems:  We  have  shown  that,  6-operator 
based  schemes  offer  superior  coefficient  sensitivity  if 

-  1|  <  loij,  Vz  =  1, . .  .m. 

Here,  indicates  the  ‘linear’  term  in  the  z-th  equation  of  f^. 

•  Similar  comments  hold  true  for  linear  systems,  piecewise  non¬ 
linear  systems,  and  piecewise  linear  systems. 


Example:  Lorenz  Equation 

Consider  the  digital  simulation  of  Lorenz  equation: 

=  aiixi{t)  +  ai2X2{t); 

x^2^\t)  =  a2ixi{t)  +  a22X2{t)  +  «2i3^i(^)a^3(^); 
xl^\t)  =  a33X3{t)  +  a3i2Xi{t)x2{t). 

Here,  an  =  —a,  ai2  =  cr,  021  —  p,  022  =  —1,  ^213  =  —1?  033  =  — /3, 
and  0312  =  L 

a-overator  based  forward  Euler  scheme  with  A  =  le  —  04 

=  «ii,a:i,(n)  +  ai2^X2^(n)] 

=  «2i,a:i,(n)  +  022,0:2,(0)  +  0213,0:1,  (0)0:3,  (n); 

g[o:3,](o)  =  033,0:3,(0)  +  0312,0:1,0:2,(0). 

Here,  on,  =  1—Acr,  012,  =  Act,  021,  =  Ap,  022,  =  1~A,  0213,  =  —A, 
033,  =  1  -  A/3,  and  0312,  =  A. 

6-overator  based  forward  Euler  scheme  with  A  —  le  —  04 
<^[2^iJ(^)  =  «ii«o:i^(o)  +012^0:2^(0); 

<5[^2fi](o)  =  a2i^xi^  (o)  +  02250:2^  (o)  +  0213^0:1^  (0)0:3^  (o); 

<^[^3,](o)  =  03350:34(0)  +O31240:i,X2,(o). 

Here,  On^  =  on,  0124  =  012,  021^  =  021,  0225  =  0,22,  02135  =  0213, 
0335  =  O335  and  O3124  =  0312. 

Simulation  data 

•  Nominal  coefficient  values:  o  =  10;  p  =  28;  /3  =  8/3.  This  system 
exhibits  chaotic  behavior. 

•  Initial  conditions:  Xq(0)  =  X5(0)  =  [0,5,  75]^. 

•  Data  type:  Two  simulations  were  implemented  in  C  using  both 
FLOAT  (32-bit  FLP)  and  DOUBLE  (64-bit  FLP)  data  types. 

•  Comparison:  DOUBLE  simulations  until  8  s  (where  both  q-  and 
<5-  DOUBLE  schemes  are  identical)  were  taken  as  a  benchmark 
for  comparison  of  FLOAT  simulations.  Clearly,  the  computed 
orbit  from  the  ^-scheme  is  more  reliable  for  a  longer  duration! 


Tima  response  of  state  2;  x2  (DOUBLE);  x2q  (FLOAT);  x2d  (FLOAT) 
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[T3]  2-D  AND  m-D  DISCRETE-TIME  SYSTEMS 


^OBJECTIVE 

•  Do  the  superior  finite  wordlength  properties  hold  true  if  2-D  and 
m-D  discrete-time  systems  are  implemented  using  (5-operator? 

•  If  so,  such  implementations  are  useful  in  high  performance,  real¬ 
time  applications  that  use  fast  sampling/short  wordlength. 

^ACCOMPLISHMENTS 

^FUNDAMENTAL  SYSTEM  THEORETIC  CONCEPTS 

•  ^-operator  analog  of  the  2-D  Roesser  g-model. 

•  Notions  of  characteristic  equation,  transfer  function,  stability, 
etc.,  have  been  developed. 

•  Algorithm  to  check  stability,  notions  of  gramians  and  balanced 
realizations  have  been  developed. 

^COEFFICIENT  SENSITIVITY 

•  FXP  case:  Balanced  realizations  possess  ‘minimum’  coefficient 
sensitivity. 

•  FLP  case:  Conditions  under  which  (5-systems  perform  better  are 
derived.  Typically,  narrowband  high  speed  digital  filters  satisfy 
these  requirements. 

FUNDAMENTAL  SYSTEM  THEORETIC  CONCEPTS 

Operators 

•  Define  operators  g/J-]  and  g^,[•]  as 

Qh[x.](iJ)  =  x(z  -I- 1,  j)  and  gt;[x](i,  j)  =  x(z,  j  +  1). 

•  Propose  operators  (5/i[-]  and  as 

Here,  and  A^,  are  positive  constants  (that  are  the  counterparts 
of  sampling  time). 


<y-Oerator  Based  Roesser  Model 

^-operator  based  Roesser  model  {Aq^Bq^Cq^Dq]  of  a  linear,  shift- 
invariant,  strictly  causal,  p-input,  ^'-output  2-D  discrete-time  system: 


r  jH)  jO)' 

rx'^(z,i)' 

+ 

_qv[x^]{i,j)  _ 

.(3)  .(4) 

L  ^q  ^q  J 

1 — 

-\A  1 

+  [BqM 

+  [Dq]u{iJ) 


+  [Dq]u{iJ). 


6-Operator  Based  Roesser  Model 

We  propose  the  following  6-operator  based  Roesser  model: 
Intermediate  eauation 


■6/,[x'^](z,j)' 

'x^{i,jy 

+ 

_6^,[x’^](^,  j)_ 

L4^>  awJ 

.X^(z,j)_ 

[sf’J 

-U.1 

+  [56]u(z,  j): 

J 

y{hj)  =  cf^] 


x'‘(i,  j) 


+  [Dsju^iJ) 


+  [DsMiJ)- 


Undate  equation 

qh[x^]{ij)  =  x^{i,j)  +  Ah  •  6/,[x^](z,i); 
9v[x^](*,i)  =  x^(z,  j)  -I-  A^,  •6^[x^](z,i). 

•  {Aq,  Bq,  Cq,  Dq}  and  {As,  Bs,Cs,  Ds}  are  related  by 

Aq  =  I+TAs',  Bq  =  TBs',  Cq  =  Cs',  Dq  =  Ds. 


Here,  r  =  [Ahl  ©  Ayl]. 


Gramians 


Analogous  to  the  1-D  and  2-D  ^-operator  cases,  reachability  and  ob¬ 
servability  gramian  are  proposed  as: 


■p(l)  p(2)' 

p(3)  p(4) 


. . 1 .  I  FF*  • 

(27rj)^  JV/  1  +  A/jC/i  1  A^,c^,  ’ 

(27rjf)^  jq-2  1  +  AhCfi  1  +  A^jCt, 

b 


Here,  F(ch,Cv)  =  (/  -  and  G{ch,Cy)  =  Cs(I  -As)  ^  T/ 

denotes  stability  boundary. 


Balanced  Realizations 

It  is  proposed  to  call  {As,  Bs,Cs,Ds}  balanced  if 


p(‘)  =  =  diag{<T<y, . . . 

pW  =  QW  =  diag{,7Sy...,<)}. 


Accomvlishments 

•  Characteristic  equation  and  transfer  function,  relationship  with 
pmodel,  equivalent  transformations,  algorithm  to  check  stability, 
etc.,  are  developed. 

•  Computation  of  gramians  is  addressed.  For  separable  systems, 
they  are  block  diagonal  and  may  be  computed  via  solution  of 
four  Lyapunov  equations. 


COEFFICIENT  SENSITIVITY 

Coefficient  sensitivity  of  proposed  model  is  investigated  via  suitable 
differential  sensitivity  measures.  Small  perturbations  are  assumed. 


FXP  Case 

Coefficient  perturbation  is  approximately  independent  of  its  nominal 
value.  Hence,  define 


Mfxp  =  WSasWI  +  hlSBsWl  + 

Here,  Sai,  =  dHs/dAs^  etc.  Hs  is  the  transfer  function. 
Accomvlishments 

•  Realizations  that  are  bound  optimal  with  respect  to  Mpxp  are 
in  fact  balanced. 

•  When  A/j  <  1  and  A^,  <  1,  that  is,  with  fast  ‘sampling’,  balanced 
^-model  is  better  than  its  corresponding  g-model. 


FLP  Case 

Coefficient  perturbation  is  approximately  proportional  to  its  nominal 
value.  Hence,  define 

Mflp  =  ||5a.||?  + 

Here,  Sas  =  JlY^aijs^Hs/daij^,  etc. 

Accomvlishments 

•  Realization  that  are  bound  optimal  with  respect  to  Mflp  are 
better  than  its  corresponding  g-model  if 

W^Q  ~  -^IIf  <  ll^qllF- 

•  High  speed  narrowband  digital  filters  typically  satisfy  this  re¬ 
quirement. 
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Example:  Narrowband  5h-5v  2-D  separable  digital  filter 

The  corresponding  ^j'-Roesser  model  and  transfer  function  are  denoted 

as  {Aq,Bq,Cq,Dq]  and  Hq{zk,Zy),  respectively. 

•  Let  {Aq,  Bq,Cq,  Dq]  deiiote  the  corresponding  balanced  q- 
system.  Under  finite  wordlength,  let  the  transfer  function  be 


Hq(^Zfi  j  Zyj . 

•  Let  {As,  Bs,Cs,  Ds}  denote  the  corresponding  balanced  6- 
system.  Under  finite  wordlength,  let  the  transfer  function  be 


Simulation  Data 

•  Mantissa  length:  Hq  and  Hs  were  implemented  with  different 
mantissa  lengths  (for  the  coefficients)  to  see  the  effects  of  coeffi¬ 
cient  sensitivity. 

•  Plot  shows  log[£'niax]  versus  mantissa  length.  Here,  log[E'max]  = 
\Hq  -  Hq\  (for  the  g-system)  or  loglE^max]  =  \Hs  -  Hq\  (for  the 
^-system).  Hq  is  implemented  with  ‘infinite’  wordlength. 

•  Clearly,  balanced  ^-system  performs  better  than  the  balanced 
g-system  (which  is  ‘optimal’  with  respect  to  the  g-system  coun¬ 
terpart  of  sensitivity  measure  Mpxp)- 


MAOUTUDE  RESPONSE  OF  ll^(z^.z^) 
1  ('Infinite'  vgondlaigth) 


-1  -1 
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COMPARISON  OF  IMPLEMENTATIONS 


FXP  CASE 

FLP  CASE 

Ouaiitization  error  bounds 
General  system 

6-systems  mostly  superior 

6-systems  better  than  or 
equal  to  ^-systems 

Ouantization  error  bounds 
Digital  equivalent  of 
continuous-time  system 
with  short  sampling  time 

5-systems  mostly  superior 

6-systenis  superior 

Limit  cycles 

5-systems  exhibit  limit 
cycles 

q-  and  5-systems  both 
exhibit  limit  cycles  only 
in  underflow 

Dynamic  ranue  constraints 

5-systems  more  likely  to 

Unlikely  in  both  q-  and 

Register  overflow 

cause  oveiilow 

5-systems 

Coefficient  sensitivity 
General  system 

5-systems  superior 

5-systems  better  than  or 
equal  to  ^-systems 

Coefficient  sensitivity 
Digital  equivalent  of 
continuous-time  system 
with  short  sampling  time 

5-systems  superior 

5-systems  superior 

Hardware  requirements 

5-systems  require  longer 

q-  and  5-systems 

Implementation  of  5'* 
requires  additional  sum 
and  product 

registers  (for  both 
coefficients  and  signals) 

comparable 
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simmations  indicate  more  complex  behavior  t^n  is  exhibited  by 
Type^  circuits.  However,  the  existence  of  an  ^tra  port  (port  1)  in 
Type-Ifs^references  may  permit  substitution  of  Alternative  devices  or 
subcircuite  to  effect  temperature-  or  Vka-coimiensation.  In  addition, 
the  volta^following  property  may  permit  tnmming  of  the  initial  op¬ 
erating  poitk  without  effecting  changes  in  me  compensation  network 
which  stabilr^  the  RJD.  / 

A  new  techfuque  for  generating  well  ^gulated  reference  currents 
and  reference  v^tages  is  presented,  appned  here  to  MESFET  circuits 
but  applicable  toWy  DFET  technology.  Simulation  results  suggest 
that  supply  rejectiOT  in  these  circuitsmay  be  comparable  to  the  best 
present  all-MESFE'K  reference  circmts. 
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Discrete-Time  Positive-Real  Lemma 
Revisited:  The  Discrete-Time  Counterpart 
of  the  Kalman- Yakubovitch  Lemma 
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Abstract — In  this  paper,  we  present  what  can  be  considered  the  discrete¬ 
time  counterpart  of  the  concept  of  positive  realness  and  the  corresponding 
algebraic  necessary  and  sufficient  criteria  that  a  discrete-time  transfer 
function  matrix  must  satisfy.  This  latter  result  may  be  thought  of  as 
the  discrete-time  counterpart  of  the  Kalman- Yakubovitch  Lemma  and  it 
is  expected  to  find  application  in  various  areas  of  study.  In  particular, 
its  use  in  proving  the  Jury-Lee  criterion  applicable  in  absolute  stability 
studies  of  a  certain  class  of  discrete-time  nonlinear  systems  is  shown. 

I.  Introduction 

The  Kalman-Yakubovitch  (KY)  Lemma  [1-2]  applicable  in 
continuous-time  (CT)  systems  and  its  discrete-time  (DT)  version  [3] 

Manuscript  received  January  5,  1994;  revised  May  8,  1994.  This  work  was 
partially  supported  by  the  Office  of  Naval  Research  (ONR)  through  the  grant 
NOOO 14-94- 1-0454. 

The  authors  are  with  the  Department  of  Electrical  and  Computer  Engineer¬ 
ing,  P.O.  Box  248294,  University  of  Miami,  Coral  Gables,  FL  33124  USA. 

IEEE  Log  Number  9405610. 


Linear  time-invariant 
System 


Fig,  1. 

have  found  application  in  various  areas  of  study,  such  as,  network 
synthesis,  spectral  factorization,  nonlinear  system  stability,  etc  (see 
[4],  and  references  therein). 

However,  this  DT  version  may  not  be  used  to  prove  the  DT 
counterpart  to  the  Popov  criterion  applicable  in  absolute  stability 
studies  of  CT  systems,  that  is,  the  Jury-Lee  criterion  [5].  Hence, 
it  cannot  be  considered  a  true  DT  counterpart  of  the  KY  Lemma.  In 
fact,  according  to  terminology  established  in  [6],  the  DT  version  in 
[3]  is  the  DT  analog  of  the  KY  Lemma. 

The  results  in  this  paper  presents  what  can  be  considered  the 
DT  counterpart  of  positive-realness  and  the  corresponding  algebraic 
necessary  and  sufficient  conditions.  This  constitutes  the  solution 
of  an  outstanding  research  problem  that  has  attracted  considerable 
attention  [7-9].  The  result  presented  may  be  used  to  prove  the  Jury- 
Lee  criterion,  and  hence,  it  can  be  considered  the  DT  counterpart  of 
the  KY  Lemma. 

II.  Preliminaries 

2.1.  Nomenclature 

Real  and  complex  numbers  are  denoted  by  3?  and  respectively. 
The  sets  of  matrices  of  size  p  x  g  over  3^  and  ^  are  3^^^^  and 
respectively.  The  set  of  rational  polynomials  over  32  is  denoted  by 
32(z)pxg.  Complex  conjugate  of  ^  G  ^  is  z*. 

P*  and  are  the  conjugate  transpose  and  transpose  of  P  G 
32^^^,  respectively.  P  G  32^^^  being  positive  definite  is  denoted  by 
P  >  0.  If  P  G  [PI  =  P+P*;  if  P  G  32^^^  |P|  =  P  +  P^. 
G*{z)  is  the  complex  conjugate  transpose  of  G{z)  G  32(z)pxq. 
Hence,  G*{z)  —  G{z*)^ .  Normal  rank  of  G{z)  G  32(2:)pxp  is 
rank[G(z)], 

The  sets  {2  G  ^  :  |^|  =  1}  and  {2  G  ^  :  |-j|  >  1}  are  denoted 
by  Tq  and  ext(7^],  respectively, 

2.2.  Absolute  Stability 

Often,  nonlinear  systems  arising  in  practice  may  be  represented  in 
the  form  of  a  linear  time-invariant  system  with  a  possibly  time-variant 
memoryless  nonlinearity  in  the  feedback  path  [10-12].  See  Fig.  1. 
Consider  the  case  when  this  linear  part  is  a  DT  system  possessing 
the  minimal  state-space  realization  {A,P,C,  P}  where 

^[x](0  =  Ax(2)  -h  Bu(0;  y(0  =  Cx(i)  4-  Pu(i),  (2.1) 
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and 

u(i)  =  -f(^y(0)*  (2.2) 

Here,  u,y  G  3^^,  x  G  and  g[x](t)  =  x(t  +  1),  Let  the 
nonlinearity  f  :  [0,oo)x3^p  -v  W  :  txy  f(i,y)  be  possibly  time- 
variant,  memoryless,  piecewise  continuous  in  t,  and  locally  Lipschitz 
in  y.  The  transfer  function  of  the  DT  linear  part  is 

G{z)  =  CrA{z)~^B+D  where  ^^(^)  =  (z/-A). 

(2.3) 

In  stability  studies  of  such  systems,  f  is  typically  taken  to  belong 
to  the  following  class  of  functions: 

Definition  2.1  [12]:  Let  f  :  [0,  oo)  x  W  :  t  x  y  ^{t,y) 

be  a  memoryless  nonlinearity,  and  ^  and  K  be  symmetric  matrices 
in  with  K  —  ^  >  0.  Then,  f  is  said  to  belong  to  the  sector 

[iL,F],  or  f  6  [K,Kl  if  (1)  f(f,0)  =  0,  Vf  G  [0,oo),  and  (2) 
[f(i-y(0)  -  iO]^[f(<,y(0)  -  B:y]  <  o,  Vi  ejo.oo),  Vy  e  3^’’. 

Given  the  configuration  in  Fig.  1  with  f  G  [K_,  /\  ],  what  conditions 
must  be  imposed  on  the  frequency  response  of  G(^)  to  ensure  global 
uniform  asymptotic  stability?  This  is  generally  referred  to  as  the 
Lur*e  problem^  When  these  conditions  are  satisfied  at  the  origin, 
the  configuration  in  Fig.  1  is  said  to  be  absolutely  stable. 

In  solving  the  Lur’e  problem,  two  types  of  Lyapunov  function 
candidates  are  generally  used.  The  quadratic  form  V{x)  —  x^Px, 
where  P  ~  >  0  and  the  nonlinearity  is  possibly  time-variant, 

yields  the  Tsypkin  criterion  [10],  [13].  This  is  the  analog  of  the  circle 
criterion  applicable  in  CT  systems  [12]  which  may  be  proven  using 
the  following  important  result  [1-2]: 

Lemma  2.1.  Kalman-Yakubovitch  Lemma:  Consider  the  minimal 
realization  {A,  P,C,  P}  of  a  stable  CT  transfer  function  G{s)  G 
5^(5)pxp-  Then,  G{s)  is  strictly  positive-real  iff,  for  some  e  >  0, 
3P  =  P^  >  0  and  matrices  W  and  L  such  that  (1)  P  H-  PA  = 
-L'^L  -  eP;  (2)  P^P  =  C  -  W'^ L\  and  (3)  W'^W  =  P  -h  P^. 

The  DT  analog  of  the  above  may  be  used  to  prove  the  Tsypkin 
criterion  [3]: 

Lemma  2.2.  DT  Analog  of  the  Kalman-Yakubovitch  Lemma:  Consider 
the  minimal  realization  { A,  P,  C,  P}  of  a  stable  DT  transfer  function 
G{z)  G  3?(z)pxp.  Then,  G{z)  is  strictly  positive-real  iff  3  P 
P^  >  0  and  matrices  W  and  L  such  that  (1)  A^PA  —  P  =  ~L^  L\ 
(2)  D'^PA  =  G-  and  (3)  W'^W  D  A  D'^  -  B^PB. 

The  above  criteria,  in  some  cases,  tend  to  provide  conditions  that 
are  conservative  [5],  [10-11].  Hence,  an  additional  restriction  on  the 
slope  of  the  nonlinearity,  together  with  the  LuPe  form  F(x)  = 
x^Fx  +  P  iOK  dC  where  P  =  >  0,  P  G  [0,oo), 

K  =  li  —  ^  >  Q,  and  the  nonlinearity  is  time-invariant,  has  been 
utilized  [5],  [11],  [14-15].  This  result  is  generally  referred  to  as  the 
Jury-Lee  criterion  [7],  [16].  Simpler  proofs  of  this,  for  special  cases, 
appear  in  [14],  [16-17]. 

The  Jury-Lee  criterion,  as  sampling  frequency  increases,  yields  the 
Popov  criterion  applicable  in  the  CT  case  [5],  [13]  which  may  also 
be  proven  using  the  KY  Lemma  [1-2].  Hence,  the  Jury-Lee  criterion 
may  be  considered  the  DT  counterpart  of  the  Popov  criterion  [6], 
Since  the  DT  analog  of  the  KY  Lemma  (Lemma  2.2)  cannot  be  used 
to  prove  the  Jury-Lee  criterion,  is  there  a  DT  counterpart  to  the  KY 
Lemma  that  will  serve  the  same  purpose?  Several  previous  attempts 
addressing  this  [7-8]  contained  discrepancies  that  were  later  pointed 
out  [9]. 

In  the  following,  what  can  be  considered  the  DT  counterpart  of 
positive-realness  is  presented.  Next,  a  related  result  that  facilitates 
an  easier  and  more  direct  proof  of  the  Jury-Lee  criterion,  with 
possibly  multivariable  and  nonisolated  nonlinearities,  is  given.  Hence, 

'  The  Lur’e  problem  was  originally  posed  for  CT  systems.  What  is  being 
posed  here  is  its  DT  version. 


it  may  be  considered  a  ‘true’  DT  counterpart  to  the  KY  Lemma  thus 
providing  a  solution  to  an  outstanding  problem. 

III.  Discrete-Time  Counterpart  of  Positive-Realness 

What  may  be  considered  the  DT  counterpart  of  positive-realness 
is  defined  as  follows: 

Definition  3.1:  Consider  a  proper  square  matrix  G{z)  G  3^(^)pxp 
with  a  minimal  realization  {A,  B,  C,  D}  as  in  (2.1),  G{z)  is  said  to  be 
discrete-time  positive- real  iff  (1)  G(z)  is  analytic  in  ext [7^];  and  (2) 
l(I+(z-l)V){G{z)-D)+{I-V)D]-\{z  -  l)iG(z)  -  D)|^  >  0 
in  ext [7^].  Here,  V  G  is  an  arbitrary  matrix. 

As  in  [3],  the  above  may  be  restated  as  follows: 

Lemma  3.1:  Let  G{z)  be  as  in  Definition  3.1.  It  is  DT  positive- 
real  iff  (1)  G{z)  is  analytic  in  ext[7^];  (2)  on  7^,  G{z)  has  only 
simple  poles,  and  at  these  locations,  the  corresponding  residue  matrix 
is  Hermitian  positive  semidefinite;  and  (3)  [(J+(z  —  l)y)(G(z)  — 
D)  +  {I-  V)D]  -  |(z  -  1)(G(2)  -  £>)|^  >  0  on  T,  whenever  G{z) 
exists.  Here,  V  €  is  an  arbitrary  matrix. 

Equations  (3.1-2)  and  Theorem  3.2  will  be  used  in  the  ensuing 
discussion: 

P  -  A^PA  =  n(z)Pr^(2)  +  l2l’’pr^(z)],  yz  6  T,  (3.1) 
and 

CATAiz)-^B  =  z{G(z)-D)-CB.  (3.2) 

Theorem  3.2.  Spectral  Factorization  Theorem:  Consider  a  proper 
square  matrix  V{z)  G  3^(2)pxp-  Suppose  V{z)  ~  V*{z)  and 
V{z)  >  0,  G  Tq.  Then,  3  a  proper  stable  T{z)  G  5^(2)pxp 
such  that  V(z)  —  T*  {z)T{z).  Moreover,  rank[T(z)]  =  p,  'iz  ^Tq. 

For  convenience,  from  now  on,  we  assume  that  A  is  stable. 
Algebraic  necessary  and  sufficient  criteria  to  satisfy  item  (3)  in 
Lemma  3.1  are  now  given  as  follows: 

Theorem  3.3:  Let  G{z)  be  as  in  Definition  3.1  with  A  being  stable. 
Then,  the  following  two  conditions  are  equivalent: 

(A.)  Frequency  domain  condition:  G{z)  is  DT  positive-real, 
that  is,  [(/  +  (2  -  1)V)(G(2)  -  D)  -h  (7  -  V)D]  - 
1(2  -  1)(G(2)  -  D)f  >  0,  V2  e  Pj.  Here,  V  G  3?'”''’ 
is  arbitrary. 

(B.)  DT  counterpart  of  the  KY  Lemma:  There  exists  P  —  >  0 

and  matrices  W  and  Q  such  that 

=  A^PA-P+(A-/)^C^C(A-/);  (Bl) 

-W'^Q  =B'^  PA\-\C\-\VC{A\-\I)\+\B^  C'^C{A\-\I)-, 

(B2) 

-W'^W  =  B^PB  +  B'^C'^CB  -  IVCB  +  (/  -  V)D]. 

(B3) 

Proof:  (B)  implies  (A):  Premultiplying  by  P^r^(2)-‘  and 
postmultiplying  by  and  then  using  (3.1),  (Bl)  yields 

p^>p  +  IP^PAr;^'PI 

-  B'^n'  Q'^QT^'  B  +  B'^rA~'  {A  -  If 
■C'^C{A-  I)r2'B.  (3.3) 

Note  that,  using  (3.2),  we  have 

B^rf'iA  -  lfc‘'C{A  -  I)T-fB  =  1(2  -  1)(G(2)  -  Df 
-  B'^C'^CB  -  [B^C’’C(A  -  /)r^‘Bl. 

(3.4) 

Substituting  (3.4)  in  (3.3)  yields  (3.5),  shown  at  the  bottom  of  the  next 
page.  Using  B^  PA  and  B^ PB  from  (B2)  and  (B3),  respectively, 
and  (3.2),  we  get  (A). 
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(A)  implies  (B):  Let 


H^(z)  =  (I l)V)(G(z)  >  D)  +  (/  -  V)D 

=  (C  +  FC(yl  -  +  VCB  +  (/  -  V)D 


and  T2{z)  =  {z-  1)(G(z)  -  d)  =  C{A  -  I)T^^B  +  CB,  where 
we  have  used  (3.2).  Hence 

T;{z)Ti{z)  =  1(2  -  1)(G(2)  -  D)|2 

=  \a  -  lfC^C(A  -  I)r2^B  +  b'^c'^cb 

(3.7) 

Noting  that  {C{A  —  I),  A}  is  observable  (since  {C,A}  is),  3R  = 
>  0  such  that 

iA-ifc'^c(A-i)  =  r-a'^ra  =  r:;iiirA+[^^iirAi  o.s) 

where  we  have  used  (3.1).  Substituting  (3.8)  in  (3.7)  yields 

T2  {z)T2{z)  =:  +  C'^C)B 

+  [B^{RA  +  C'^C(A-I))rA^B\.  (3.9) 

Then,  [£(2(2)]  =  T2(z)T2(z),  where 

H2{z)  =  B'^iRA  +  C'^C(A  -  /))rj‘5  +  ^B^{R  +  C^C)B. 

(3.10) 

Now,  with  H{z)  =  H,{z)  -  H^iz),  lH(z)]  being  the  left  hand  side 
of  (A),  we  have 


Letting 


iR{z)\  >  0,  V2  e  T,. 


(3.11) 


V(z)  =  lH{z)\  =>  V{z)  =  V'{z)-,  V{z)  >  0,  V2  e  Tq.  (3.12) 

Thus,  from  Theorem  3.2,  3  a  proper  stable  T{z)  €  K(z)pxp  such  that 

V{z)  =  T’‘{z)T{z).  (3.13) 

Let  {F,  G,  K,  L)  be  a  minimal  realization  of  T{z),  that  is,  T{z)  = 
AT^^G  +  L.  Hence 

T'{z)T(z)  =  G^rr‘A'^ATp‘G+i^A  +  lL^ATp^Gl.  (3.14) 
Since  {A",  F}  is  observable,  35  =  5^  >  0  such  that 

K'^K  =  5  -  F^5F  =  r^STf-  +  lF^5rf-J.  (3. 15) 
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Therefore,  substituting  (3.15)  in  (3.14)  yields 

T\z)T{z)  =  {G'^SG+L^L)  +  l{G'^SF  +  L^K)Vp^Gl.  (3.16) 

From  (3.11-13),  T\z)T{z)  =  V(z)  =  [ff(2)I  =  lH^{z)- n2(z)]. 
Hence  (3.17),  which  is  shown  at  the  bottom  of  this  page.  Compare 

(3.16)  and  (3.17).  Since  F  and  A  are  stable,  match  the  stable  parts 
to  yield 

(G’^SF+L^KjVpG  = 

[G+FG(A-i)-B^(7?A4-G^C(A-/))]l^‘5  ‘ 
u  ^  (318) 

Hence,  3  a  nonsingular  matrix  M  G  K"’'”  such  that 
M~^FM  =  A-,  M~^G=B-, 

(G^5F+i^/i:)M=C  -I-  VC{A  -  I)-B'^{RA+C‘^C(A-I)). 

(3.19) 

Now,  define  P  =  M'^SM  +  R,  Q  =  km,  mA  W  =  L. 
Premultiplying  by  and  postmultiplying  by  M,  (3.15)  yields  (Bl) 
when  the  first  equation  in  (3.19)  and  (3.8)  are  used.  Last  equation  in 
(3.19)  yields  (B2).  Compare  the  constant  terms  in  (3.16-17)  to  get 
(B3).  P 

Note:  The  presence  of  F  to  be  manipulated  as  an  additional 
parameter  will  be  useful  in  the  next  section  when  the  Jury-Lee 
criterion  is  proven. 


IV.  Jury-Lee  Criterion 

Jury-I^e  criterion  may  be  used  for  absolute  stability  studies  of 
multivariable  DT  nonlinear  systems  [15].  Corresponding  sufficient 
conditions,  as  obtained  in  [16-17],  are  now  derived  using  Theorem 
3.3.  In  [17],  the  nonlinear  system  in  (2.1)  and  (2.2)  with  Z)  =  0  is 
considered.  Let  the  nonlinearity  possess  the  following  properties-  For 
all  j  =  l,2,...,p, 

_  (1)  fi(0)  =  0;  (2)_0  <  yifi(y,)  <  fc.yF  and  (3)  -k,  <  < 

k,.  Here,  >0,  ki  >  ki,  Vi  =  1,2, ...,p.  In  [17],  it  is  shown 
that,  existence  of  F  =  F^  >  0  and  matrices  Q  and  A  such  that 

A'^PA  -  F  +  (A  -  lfc^M'^MC(A  -  7)  =  -Q'^Q; 
B'^PA-NC-SCiA-I)+B'^C'^M‘^MC(A-I)  =  -R^Q- 
B'^PB  +  B'^C'^M'^MGB  -  2FA'-'  -  [SCFJ  =  -R^R 

(4.1) 

is  sufficient  for  absolute  stability.  Here,  M  is  a  certain  diagonal  matrix 
with  positive  elements  associated  with  the  slope  condition,  N  = 
N'^  >  0,  and  5  is  a  diagonal  matrix.  Also,  K  =  diag{ici fcp}. 


B'^PB  +  [F'-'FAr^'FJ  =  +  1(2  -  1)(G(2)  -  D)\^  -  b'^C'^CB 

-[F^G^C(A-7)r;'FI 

=  B  -f-  +  1(2  -  1)(G(2)  -  F)|^  -  B'^C'^CB 

-  w'^w  -  {w'^Q  +  b'^c'^c{a  -  /)r;^‘Fi. 


T  {z)T(z)  -  i[g+  FG(A  -  /)  -  F^(FA  +  G^G(A  -  /))jr;^’F  +  VCB  +  {l-V)D] 
-  B'''{R  +  C'^C)B. 


(3.17) 
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Substitute  the  following  in  (Bl),  (B2),  and  (B3)  in  Theorem 
3.3:  A  A;  B  C  MC\  [(/  -  V)D\  ^ 

2MK~^ N~'^ M'^ .  Now,  (A)  yields  the  following  frequency  domain 
condition  which  is  identical  to  that  in  [17]: 

NK-^  +  l\[N,+l{z  -  l)5]G(z)]  -  ^\z  -  If 
■G'{z)M'^MG{z)  >  0,  Vz  e  T,. 


V.  Conclusion  and  Final  Remarks 

What  may  be  considered  the  DT  counterpart  of  positive-realness 
and  the  corresponding  algebraic  necessary  and  sufficient  conditions 
(Theorem  3.3)  have  been  presented.  This  latter  result  facilitates 
the  proof  of  Jury-Lee  criterion  and  can  be  thought  of  as  the  DT 
counterpart  of  the  KY  Lemma,  thus  successfully  addressing  an 
outstanding  research  problem  [7-9].  It  is  also  expected  to  find  use 
in  generalizing  the  Jury-Lee  criterion  and  in  various  other  areas  of 
study,  such  as,  network  synthesis,  spectral  factorization,  etc. 
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Impact  of  Distributed  Gate  Resistance 
on  the  Performance  of  MGS  Devices 

Behzad  Razavi,  Ran-Hong  Yan,  and  Kwing  F.  Lee 

Abstract — ^This  paper  describes  the  impact  of  gate  resistance  on  cut-off 
frequency  (/r)j  maximum  frequency  of  oscillation  (/mar),  thermal  noise, 
and  time  response  of  wide  MOS  devices  with  deep  submicron  channel 
lengths.  The  value  of  /t  is  proven  to  be  independent  of  gate  resistance 
even  for  distributed  structures.  An  exact  relation  for  fmax  is  derived  and 
it  is  shown  that,  to  predict  fmax,  thermal  noise,  and  time  response,  the 
distributed  gate  resistance  can  be  divided  by  a  factor  of  3  and  lumped 
into  a  single  resistor  in  series  with  the  gate  terminal. 

1.  Introduction 

The  remarkable  improvement  in  the  performance  of  CMOS  circuits 
as  a  result  of  scaling  has  motivated  extensive  research  on  deep 
submicron  MOS  devices  [1],  [2].  While  short  channel  effects  such  as 
velocity  saturation  and  threshold  voltage  variation  become  significant 
for  dimensions  below  approximately  2  ^m,  other  nonidealities  mani¬ 
fest  themselves  for  only  very  small  channel  lengths.  In  particular,  the 
gate  resistance  of  a  short-channel  device  can  substantially  affect  its 
performance  if  the  transistor  width  is  increased  to  attain  high  current 
drive  or  large  transconductance.  This  effect  becomes  especially 
noticeable  in  line  drivers  and  output  buffers  used  in  digital  systems 
and  low-noise,  high-gain  amplifiers  employed  in  analog  applications, 
ail  of  which  typically  require  wide  MOSFETs. 

Even  though  the  overall  gate  resistance  can  be  lowered  through 
silicidation  or  the  use  of  multiple  gates,  these  remedies  have  certain 
limitations.  For  example,  the  thickness  of  gate  silicide  must  scale 
with  channel  length,  thereby  yielding  a  higher  sheet  resistivity  for 
shorter  devices  [1].  Also,  increasing  the  number  of  gates  (to  allow 
narrower  devices  for  a  given  total  width)  tends  to  increase  the  source 
or  drain  junction  capacitance  and  degrade  circuit  density. 

This  paper  describes  the  impact  of  distributed  gate  resistance 
on  four  aspects  of  the  performance  of  deep  submicron  devices: 
cut-off  frequency  (/t),  maximum  frequency  of  oscillation  (/max), 
input-referred  thermal  noise,  and  time  response.  The  primary  goal 
is  to  quantify  this  impact  with  relatively  simple  expressions,  thus 
allowing  technologists  and  circuit  designers  to  easily  determine  the 
maximum  gate  resistance  that  can  be  tolerated  in  a  given  application. 
The  analyses  are  performed  for  an  NMOS  transistor  whose  gate  is 
contacted  only  at  one  end,  but  the  results  can  be  readily  applied  to  all 
field  effect  devices  and  structures  with  multiple  gate  contacts  as  well. 

The  next  section  of  the  paper  analyzes  the  effect  of  gate  resistance 
on  /t.  Sections  III  to  V,  respectively,  formulate  the  dependence  of 
/max,  thermal  noise,  and  time  response  on  the  gate  resistance.  Section 
VI  summarizes  the  results. 

11.  Cut-off  Frequency 

Defined  as  the  frequency  at  which  the  short-circuit  small-signal 
current  gain  of  a  transistor  drops  to  unity,  fr  is  a  measure  of  the 
speed  of  the  intrinsic  device  excluding  its  Junction  parasitics.  If  the 
gate  resistance  is  modelled  as  a  lumped  resistor  in  series  with  the 
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Robust  Stability  of  Time- Variant  Discrete-Time 
Systems  with  Bounded  Parameter  Perturbations 

Kamal  Premaratne  and  Mohamed  Mansour 


Abstract — In  this  paper,  global  asymptotic  stability  of  linear,  time- 
variant,  finite  dimensional,  zero  input  difference  equations  is  investigated. 
We  propose  a  technique  that  may  be  utilized  to  obtain  regions  of 
asymptotic  stability  in  the  coefficient  space  that  incorporate  information 
regarding  the  maximum  rate  of  change  of  system  parameters.  Use  of 
different  matrix  norms  provide  different  “shapes”  for  the  maximum 
allowable  coefficient  perturbations. 

Nomenclature 

3?,  Real  and  integer  numbers. 

K-I-,  Positive  and  non-negative  integers. 

Qf  length  k  over  3?,  Matrices  of  size  /c  x  ^ 

over  3^. 

Ofcxf,  I)txr  Null  matrix  and  identitiy  matrix  of  size  k  x  L 

Given  quantity  [•]  of  a  TV  system,  the  analogous  quantity  of  the 
corresponding  TI  system  is  denoted  by  [^].  For  example, 

a(?i)  G  3^”*  Coefficient  vector  [ai  (/j), , . . ,  of  a  TV 

difference  equation  of  order  rn. 

a  G  3?"*  Coefficient  vector  of  the  corresponding  TI  differ¬ 

ence  equation  of  order  rn,  that  is,  [ai , . . . ,  a^]  ^ . 

A(?r)  G  Companion  form  corresponding  to  Si(n)  (see 

(3.3)). 

A,[i]  Eigenvalues  of  A  G 

\\A\\p  Induced  p-norm  of  A  G  that  is, 

si'Pktso  1^-  Here,  p  €  {H+  U  .dC'}, 

A  1(71)  G  3?  Perturbation  on  the  coefficient  a,{7i)  G  'R  at  time 
in.stant  n. 
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A{n)  G  Perturbation  matrix  on  system  matrix  A{^^)  G 

^rnxm  iostant  n  (see  (3.17)). 

Product  of  J  +  1  consecutive  system  matrices  that 
are  TV,  that  is,  HLo  A{n-\-i)  —  A(n-Hj)‘A(Ti-h 
j-l)  •  •  •  A(7i)-A(7i-f  j-1) .  -  -  A(ti)  (see  (3.20)). 

11.  Introduction 

Parameter  uncertainties  are  inherent  in  system  models  utilized  for 
analysis  and  design.  The  explosion  of  recent  research  activity  in 
related  areas,  in  particular,  in  the  area  of  robust  stability,  is  mainly  due 
to  the  seminal  work  of  Kharitonov  [1].  Since  this  result,  robust  sta¬ 
bility  of  time- invariant  (TI)  systems  with  uncertain  parameters  have 
received  considerable  attention  (see  [2-3],  and  references  therein). 

Many  important  results  regarding  robust  stability  of  time-variant 
(TV)  systems  with  uncertain  parameters  are  also  available.  Some 
earlier  results  appear  in  [4-6],  and  references  therein;  newer  results 
are  constantly  being  introduced  (see  [7-11],  and  references  therein). 
Such  systems  find  application  in  various  branches  in  signal  processing 
and  control,  such  as,  adaptive  signal  processing,  finite  wordlenglh 
implementation  of  digital  filters  [12],  and  design  of  reconfigurable 
systems  [13]. 

For  a  TV  system  represented  in  its  difference  equation  formulation, 
the  work  in  [10]  provides  a  region  in  the  coefficient  space  wherein  the 
coefficients  may  vary  while  maintaining  global  asymptotic  stability 
(GAS).  For  a  TV  system  represented  in  its  state-space  (SS)  formula¬ 
tion,  the  work  in  [  1 1  ]  provides  a  necessary  and  sufficient  condition  for 
robust  GAS.  However,  in  these  work,  no  restriction  has  been  imposed 
on  its  maximum  rate  of  change  whereas,  in  most  practical  situations, 
such  a  restriction  is  typically  inherent.  An  important  outstanding 
research  problem  is  to  incorporate  such  information  and  obtain  a 
region  (or,  regions)  in  the  coefficient  space  where  GAS  of  a  TV 
system  is  guaranteed  [2,  Open  Problem  #9]. 

The  work  below  attempts  to  address  the  above  problem.  The  results 
presented,  as  they  stand,  can  be  computationally  demanding.  For 
second-order  systems  at  least,  it  is  quite  conveniently  applicable. 
The  authors  hope  that  this  work  may  serve  as  an  impetus  for 
further  improvements.  The  paper  is  organized  as  follows:  Section 
II  formulates  the  problem  where,  for  the  readers’  convenience,  we 
follow  the  same  notation  as  in  [10].  A  different  but  enlightening 
proof  of  the  main  result  in  [10]  is  provided.  Section  III  contains  the 
main  results.  A  procedure  that  can  generate  a  region  in  the  coefficient 
space  that  guarantees  asymptotic  stability  (AS)  is  described.  Section 
IV  is  provides  an  example.  Section  V  contains  concluding  remarks. 

III.  Problem  Formulation 

Consider  the  following  linear,  possibly  time-variant  (TV),  finite 
dimensional,  zero  input,  difference  equation  of  order  m  \ 

ni 

i/(«)  =  Y^<i,(n)y(n  -  i)  =  a(»)V("  -  !)•  (2.1) 

1=1 

Definition!.}:  The  TV  system  in  (2.1)  is  said  to  be  asymp¬ 
totically  stable  in  Q.  if,  for  any  y(0)  =  [.(/(-I) . 

liiUH_ooy(u)  =  0  whenever  a('U.)  G  V;i. 

Remarks: 

1 )  Due  to  the  fact  that  the  system  under  consideration  is  linear,  AS, 
as  defined  above,  is  equivalent  to  the  notion  of  GAS  [14,  ch.  3]. 

2)  In  investigating  such  regions  of  AS,  one  may  distinguish 
between  TI  and  TV  regions  [10].  In  this  paper,  our  interest 
lies  in  obtaining  only  TI  regions  of  AS. 
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Now,  consider  G  .  The  problem  of  determining  the  “largest” 
such  region  so  that,  whenever  a(n)  G  Vn,  AS  of  (2,1)  is 
guaranteed  has  been  addressed  in  [10].  A  relevant  result  is 

Theorem  ILL  [10]:  Let  Tl  =  {a(n)  G  :  Y:T=i 

<  7  <  1,  Vn).  Then,  whenever  a(ri)  G  Vn,  the  TV  system  in 
(2.1)  is  AS.  Moreover,  such  a  TI  region  for  AS  may  not  contain  any 
point  a(n)  that  satisfy  |ai-(n)|  =  1  +  c,  for  any  e  >  0. 

Remark:  The  latter  part  of  Theorem  II.  1  indicates  that  the  region 
n  is  in  fact  the  largest  hyperdiamond  region  with  the  origin  as  its 
center.  In  this  sense,  the  region  Q,  is  in  fact  not  too  conservative. 

Therefore,  with  no  restriction  imposed  on  the  amount  of  perturba¬ 
tion  allowable  on  each  coefficient  ai(n)  at  each  time  instant,  as  long 
as  a(n)  G  Vti,  AS  of  the  system  in  (2.1)  is  guaranteed.  However, 
in  practice,  the  rate  of  change  of  each  coefficient  ai(n),  is  restricted. 
Can  we  incorporate  this  additional  restriction  into  the  above  result? 
Intuitively,  the  region  obtained  thus  must  be  larger  than  that  indicated 
in  Theorem  ILL  It  is  this  problem  (see  also  [2],  Open  Problem  #9) 
that  we  attempt  to  address. 

IV.  Main  Results 

By  expressing  the  system  in  (2.1)  in  its  SS  formulation,  we  now 
prove  Theorem  II.  1  utilizing  norm  arguments.  Such  a  proof  has 
the  following  advantages:  a)  It  exposes  a  certain  interesting  norm 
property  of  companion  matrices  (see  Lemma  III.l);  b)  then,  proof  of 
Theorem  11. 1  follows  quite  easily;  c)  SS  formulation  is  an  ideal  tool 
for  the  problem  at  hand;  and  d)  it  provides  the  possibility  of  using 
different  norms  thus  yielding  different  regions  of  AS. 

Clearly,  utilizing  the  state  variables 

Xrr^{n)  =  y{n)‘ 

— 1(71)  —  Xmi^Tl  1)  =  2/(^  — 

.  (3.1) 

xi{n)  =  X2{n  -  1)  =  y(n  -  m  +  1), 
the  system  in  (2.1)  may  be  expressed  as 

K{n)  =  A{n)  •  x{n  -  1);  7/(71)  =  C  -  x(7i),  (3.2) 

where  x(r7)  =  [x'l  (n), . . . ,  .r^(n)]^,  and 


A(77) 


C: 


0  0 
Mriiin)  a,n_i(n) 
■01  “ 

0 

0 
.1 


0  0  1 
0  0 


0 


1 


02  {n)  a  1(77.) 


G  3?" 


G  W 


(3.3) 


Hence 

x(n+j)  =  77'>.x(7!-1)  and  y(n+j)  =  -xin-l), 

(3.4) 

where 

j 

^  ~  +  i)  —  j)A{n-\~  j  -!)•••  A{n  -h  l)A(7i). 

1=0 

(3.5) 

Now,  for  a  given  j  =  0, 1,  2, . . .,  if 

Ile7'’ll<7<l,  Vn.,  (3.6) 


then,  lim„-.ooyin)  -  0  implying  AS  of  (3.2)  (or  (2.1)).  Here,  ||  •  || 
denotes  any  mutually  consistent  matrix  norm  [14],  in  particular,  the 
p-norms. 

Remarks: 

1)  Note  that 


=  A(n  +  i).pW^  Vn. 


(3.7) 


2)  Note  that  A{n j)  is  the  corresponding  system  matrix  in  its 

companion  form  as  in  (3.3).  Taking  this  structure  into  account, 
one  notices  that  premultiplication  of  Pn^  by  A{n-\~j)  simply 
shifts  the  last  7i  —  1  rows  of  Pn^  upwards  by  one  row.  Hence, 
the  first  n  -  1  rows  of  are  identical  to  the  last  t?  -  1 

rows  of  Pn\ 

3)  With  remark  2  in  mind,  it  is  not  difficult  to  show  that 


^(j)  _ 


1 


(m-j)x(m-J) 


i  =  1,...,7?7, 


(3.8) 


where  P„ 


G  SR” 


dO)  .1.  (Aj)  \ 

Using  the  oo-norm,  the  condition  for  AS  in  (3.6)  may  now  be 
utilized  to  prove  Theorem  ILL  First,  we  need 
UmmallLl:  Consider  the  TV  system  in  (3.2)-(3,3).  Whenever 
a(77)  G  n,  Vn, 

m~l 

i=0 

=  II  A(77  -h  777  -  1)A(77  -h  777.  -  2)  •  •  •  A(77.  +  l)A(r7.) 

<  7  <  1,  V77, 

Here,  is  the  region  given  by  Theorem  ILL 

Proof:  What  we  need  to  ascertain  is  that,  whenever  a(77)  G 
V77,  the  product  of  m  consecutive  A(77  -h  7)’s,  that  is,  P!f^\  V77, 
has  an  cc-norm  of  not  more  than  7.  With  remark  2  above  in  mind, 
we  show  that,  given  a(77.)  G  V77,  the  newly  computed  last  row  of 
Pn'^  ^  has  an  oo-norm  of  not  more  than  7.  If  this  holds  true  for  m 
consecutive  products,  we  will  have  ||Pn"'^iloo  <  7  <  1,  as  desired. 
We  proceed  with  an  inductive  scheme  on  j  =  1, 2, , . . ,  777  -  1. 
First,  when  j  =  I,  clearly  the  claim  is  true. 

Next,  assume  that,  for  some  1  =  1,  2, . . . ,  m  ~  1,  the  last  row 


of  Pr,  has  an  oc-norm  of  not  more  than  7.  Noting  that  Pj 
arrived  at  by  (  consecutive  multiplications,  each  row  of  the  submatrix 
. rn  oo-noim  of  not  more  than  7. 

we  need  to  show  that,  the  newly  computed  last  row  of 
also  has  an  oo-norm  of  not  more  than  7.  Note  that,  from 
(3.8),  elements  of  this  last  row  are  given  by 


(n 


p(  f  4- 1 ) 

7  n 


I\n.3  -  \  (C) 


i  .3' 


for  i3  —  1, . . . ,  f; 


2  =  1  rii + 1  — » ,  3  + 1  -j-  f — /?  7  for  B  —  1  ~\-  1 , . . . ,  777 . 

Hence,  the  oo-norm  of  this  last  row  is  given  by  the  equations  shown 
at  the  bottom  of  the  next  page.  This  completes  the  proof. 

Corollary  111.2:  Whenever  a(77)  G  U,  V77,  the  TV  system  in  (3.2) 
is  AS. 

Proof:  This  is  immediate  from  Lemma  III.l  and  (3.6). 

Remark:  Note  that, 


lU^n'^lloo  >  1  for  i  <  in  -  1. 


(3.9) 
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Hence,  if  the  norm  condition  in  (3.6)  is  to  be  utilized  for  AS 
investigations,  it  is  necessary  to  deal  with  at  least  m  consecutive  mul¬ 
tiplications.  The  result  in  [10]  utilizes  exactly  m.  such  multiplications 
(see  Lemma  III.l). 

Now,  consider  the  following  TI  counterpart  of  the  system  in 
(3.2)-(3.3): 


x(7?.)  =  A  •  x(7i  —  1);  y(7?)  =  C  •  x(7i),  (3.10) 


where 


A  = 


“  0 

1 

...  0 

0  ■ 

0 

0 

...  0 

0 

0 

0 

...  0 

1 

.a^n 

—  1 

•  •  •  d2 

di  . 

6  3^' 


G 


(3.11) 


This  TI  system  is  AS  if 

|Ai[i]|  <  1,  Vi  =  (3.12) 

Hence,  the  corresponding  AS  region  is  ^1°°^  where 

nlr’  =  |a  6  3?”’  :  |A,[i]|  <  u>,  Vi  =  (3.13) 

where  to  E  5?-  On  the  other  hand,  for  i  =  K-|-,  let 

ni?„  =  {aG»”':|lP''^|l<a;},  (3.14) 

where  =  A\  Note  that,  in  general, 

(3.15) 

However,  since  limi^c»  A'  =  Omxm  for  a  E  we  have 

lim  (3.16) 

I— ‘OO 


In  fact,  although  Clp%  ^  Clp\Z^\  typically,  is  larger  in  “volume” 
than  For  in  =  2,  Fig.  1  shows  for  i  ™  1,2, 3, 4, 5, 

where  the  familiar  triangular  stability  region,  is  also  shown. 

Similar  figures  for  other  norms  may  also  be  obtained. 


^  1  f  I  I  f 


=  ^i 

+  E 

i+i+e-0 

^=1 

/i=l  1 

J=1 

t=l 

e 

c 

C 

in 

m 

^Ei 

«.i  EK).- 

„fl  1  +  E 1 

+  E 

|Gm+l  +  r-/3 

f3  =  l 

1=1 

(3=C+1 

I3=e+i 

£ 

<Ei 

rn 

a^l  -h  ^  \ai  \  : 

m 

=  Ei“‘i 

<  7  <  1. 

i  =  l  £  =  ^^-l  f=l 
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Now,  it  is  instructive  to  see  whether  a  norm  condition  on  Pn\  i  > 
m,  may  provide  a  larger  region  of  AS.  Given  let  us 

denote  the  perturbation  on  ai{n)  G  3^  at  time  instant  n  by  A^(7^)  G  3^. 
Taking 

A{n)  = 

0(m  — 1)  X  m 

G 

_Am{n)  Am-i{n)  A2(n)  Ai(7r) 

(3.17) 

we  now  have 


Then,  we  may  obtain  a  bound  for  as  a  function  of  ||A||' 

and  ||>l(n)*||,  Vi  =  that  is, 

<  <7(11  All, . . . ,  ||A|h  ||A(n)||, ....  IM(nni).  (3.28)^ 

Next,  in  order  to  enforce  a  bound  on  g,  within  find  the 

maximum  values  of  ||A(7i)*||,7  =  For  this,  one  needs  to 

perform  a  search  over  the  (m  ™  l)-dimensionaI  boundary  of 
When  these  maximum  values  are  substituted,  the  bound  for  g  obtained 
thus  is  a  polynomial  of  degree  j  in  ||  A||  with  no  constant  coefficient. 
Hence,  let  this  bound  be 


A{n  +  i)  =  A(77)  +  A(77); 

A(77  +  2)  =  A{n  +  1).+  A(77  +  1)  =  A(7i)  +  A(?7  +  1)  +  A(77) 


3 

h  =  ^/).i||A||'  <  0,  ho  =  -(1  -  6),  0  <  <5  <  1.  (3.29) 

i=0 


i-1 

A{n  +  j)  —  A(77)  +  ^  A(n  +  k). 


(3.18) 


Hence 

=  P[A(n  +  i)  =  ]][  M(«)  +  ^  A(i!.  +  A:)  ]  (3.19) 

i  =  0  i=0  \  k=0  J 

=  pO+i)+AO+»^  (3.20) 

where 

=  A{ny+^  e  (3.21) 

is  obtained  by  consecutive  multiplication  of  the  same  system  matrix 
A(77)  (compare  with  and  G  is  a  certain  matrix 

that  is  solely  due  to  the  coefficient  perturbations.  In  fact, 

lim  a5/+‘)  =  On,xn,.  (3.22) 

Vtirl . m 

It  is  (3.20)  that  we  will  utilize  to  arrive  at  our  goal. 

To  satisfy  the  norm  condition  in  (3.6),  we  need 

+  aJ/+‘’||  <  7  <  1-  (3.23) 

To  ensure  (3.23),  let  ||P^^'+'’||  +  ||A|/+'>|i  <  7  <  1.  which  is 
satisfied  if  the  following  two  conditions  are  valid: 

(cl)  Choose  a(r?.)  G  3?"'  such  that 

<  <^  <  7,  that  is,  a(77)  G  V/7.  (3.24) 

(c2)  Choose  the  perturbation  allowed  on  each  coefficient  such  that 

IIaL-^+'^II  <1-6.  (3.25) 

It  is  not  hard  to  see  that,  in  general,  is  a  function  of 

A(77  +  7),  Vi  —  -  1,  and  A{n)\  i  =  Hence, 

it  may  be  expressed  as 

a5,-'+‘>  =  f(A(7!.),...,  A(j;. +  f  -1);.4(«) . 4(/0'').  (3.26) 

Taking  the  maximum  allowable  rate  of  change  at  each  time  instant 
to  be  equal,  let 


Clearly,  ||A||  =  0  (that  is,  the  TI  case)  satisfies  (3.29).  Let  Vi,  i  = 
be  the  non-negative  (we  are  only  interested  in  this  case 
since  ||  A||  >  0)  roots  of  h  =  0.  Note  that,  0  cannot  be  a  root  of  h.  If 
0  <  ri  <  7*2  <  •  •  •  <  we  conclude  that  0  <  ||A||  <  77.  Note  that, 
r2  <  i|A||  <  ra  is  of  no  use  since,  at  any  arbitrary  instant  in  time, 
we  must  allow  for  the  coefficients  to  be  stationary,  that  is,  ||  A||  =  0. 

Summarizing  the  above,  assuming  (3.27),  we  have  the  following: 

Step  I:  Pick  j  =  2.  With  j  =  1,  Theorem  II.  I  provides  the 
“optimal”  region.  For  a  suitable  0  <  6  <  1,  let  a(7i)  G  Vn. 

Step  11:  Find  the  bounding  function  g  in  (3.28).  An  appropriate 
algorithm  that  is  applicable  to  a  system  of  any  general  order  is  in  [15]. 

Step  HI:  On  the  {m  -  l)-dimensional  boundary  of  find 

the  maximum  values  of  ||A(7?.)*  ||,  7  = 

Step  IV:  Obtain  h  in  (3,29). 

Step  V:  The  maximum  allowable  rate  of  change  of  coefficients  i 
within  is  the  least  positive  root  of  h. 

Step  VI:  If  the  actual  allowable  rate  of  change  is  higher,  one  may 
repeat  the  procedure  with  a  lower  <5  resulting  in  a  smaller  region.  If 
the  actual  allowable  rate  of  change  is  restricted  to  be  lower,  one  may 
repeat  the  procedure  with  a  higher  j  resulting  in  a  larger  region. 

Remarks: 


1)  Step  III  can  be  very  computer  intensive.  For  second-order 
systems,  however,  this  search  procedure  is  quite  easy.  See  next 
section. 

2)  By  using  different  norms,  one  may  obtain  different  “shapes”  of 
regions  (in  the  coefficient  space)  for  the  maximum  allowable 
perturbation  on  the  coefficients.  For  example,  oc-norm  gives  a 
diamond-shaped  region;  2-norm  gives  a  circular  region;  1-norm 
gives  a  box-shaped  region. 


V.  Example 

Our  main  results  are  now  illustrated  through  an  example.  Due  to  the 
light  computer  burden  and  the  possibility  of  graphically  representing 
the  relevant  regions,  we  consider  a  TV  system  of  order  2. 

Step  I:  Let  j  =  2.  The  regions  0?^^^  for  6  — 

0.5,  0.6, 0.7,  0.8,  0,9, 1.0  are  in  Fig.  2. 

Step  II:  Note  that,  =  {A{n)  -f  A(//)  +  A{/7  -f-  l))(.4{/0  -f- 
M>:))A{n)  =  where  A','”  =  -4(/,)'‘  and 


A 


(:t) 


=  (A(7t)  +  Mn  +  l))(.4(/i.)  +  A{n)]A{n) 

+  A(n}A(n)A{n} 

=  A(n)A{nf  +  A(?!)'^A(7i)  +  A(7)  +  l)4(n 
H-  A{ii  H-  1)A(77)A(/()  +  A{ii.)A{tt)A{n]. 


f 


||A(n  +  i)||  <  ||A||,  Yi  =  Hl. 


(3.27) 
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Fig,  2.  The  regions  6  =  0.5, 0.6, 0.7, 0.8, 0.9, 1,  for  a  second-order  system. 


TABLE  I 

l|A||  cxi,mar  Obtained  with  Regions  for  a  Second-Order  System 


l|/l(n)IIoo,ma.x  —  ||'^(»^)^l|oo,ni 


l|A||oo.n 


0.900 

0.800 

0.700 

0.600 

0.500 


1.886 

1.777 

1.664 

1.547 

1.415 


0.014 

0.029 

0.048 

0.070 

0.098 


TABLE  II 
(4) 

*'00,5 


II  Allcxj. max  Obtained  with  Regions  fora  Second-Order  System 


0.900 

0.800 

0.700 

0.600 

0.500 


Il-^(”)lioo,miix  —  }|A(n)^j(o 


llAlloo.n 


2.312 

2.213 

2.121 

2.011 

1.873 


2.232 

2.063 

1.906 

1.725 

1.517 


0.004 

0.009 

0.015 

0.022 

0.031 


The  last  equality  above  is  f  in  (3.26).  Assuming  (3.27),  g  in  (3.28)  is 

<5(l|A|U,||A||^,||A(n)|U,||.4(uf||,^) 

=  2  •  ||A||oo  •  ||A(ji,)^||oo  +  2  ■  ||A||^  •  ||,4(7t)||oo 

+  IIAIloo  •  PIlL. 

Step  III:  For  the  second-order  case  consider  here,  it  is  extremely 
easy  to  find  the  corresponding  maximum  values  of  ||44(7i)||o:,  and 
||A(7?)  II oo.  In  fact,  these  simultaneously  occur  at  the  same  point. 
See  Table  I, 


Step  IV:  The  function  h  in  (3.29)  is  given  by 

h  =  [2-  ||A||oo,max]  *  ||A||^  -f  [2||A^||oo.max  +  ||A||^ 

•||A||oo-(l-(5). 

Step  V:  The  lowest  positive  root  of  It  gives  the  maximum 
allowable  rate  of  change  of  coefficients  within  See  Table  I. 

Step  VI:  The  procedure  above  was  repeated  for  j  =  3.  The 
regions  are  in  Fig.  3  while  the  corresponding  results  are 
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a1 

Fig.  3.  The  regions  <5  =  0.5, 0.6, 0.7, 0.8, 0.9, 1,  for  a  second-order  system. 


tabulated  in  Table  IL  Note  how  it  verifies  the  remarks  made  under 
Step  VI  in  Section  III. 

Remarks: 

1)  Similar  computations  may  be  done  using  the  2-  and  1 -norms 
as  well. 

2)  The  value  of  ||A||oo,max  obtained  with  the  use  of  the  oo-norm 
provides  a  diamond-shaped  region  at  each  coefficient  a(77)  G 

where  it  may  be  perturbed  to  a(?i  -1-  1)  G 
Similarly,  ||A||2,max  corresponds  to  a  circular  region  while 
l|A||  1  ,max  corresponds  to  a  box-shaped  region. 

3)  Often,  the  coefficients  are  given  to  be  TV  and  restricted  to 
be  within  a  box-shaped  region  in  the  coefficient  space.  Note 
the  difference  between  the  coefficients  and  the  perturbations. 
Item  2  above  describes  the  ’’shape”  of  possible  perturbations. 
Such  a  situation  may  be  easily  incorporated  into  the  above 
procedure  to  obtain  a  value  for  the  maximum  rate  of  change 
that  is  sufficient  for  AS. 

VI.  Conclusion  and  Final  Remarks 

The  results  presented  in  this  paper  provides  a  technique  that 
may  be  utilized  to  obtain  regions  of  AS  in  the  coefficient  space  of 
linear  TV  difference  equations.  The  regions  thus  obtained  incorporate 
information  regarding  the  maximum  rate  of  change  of  the  system 
parameters. 

The  technique  proposed  yield  only  sufficient  conditions.  In  dealing 
with  higher  order  systems,  the  computational  burden  may  be  quite 
heavy.  However,  the  regions  are  invariant  for  a  given  n;,  and 

hence,  it  is  only  necessary  to  compute  them  once.  When  ni  =  2, 
the  application  of  the  proposed  method  is  quite  straightforward.  It  is 
the  authors’  hope  that  this  work  will  encourage  improvements  to  the 
technique  presented  and  development  of  alternate  algorithms. 
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Two-Channel  HR  QMF  Banks  with  Approximately  Linear-Phase 
Analysis  and  Synthesis  Filters 
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Abstract-  Perfect  linear-phase  two-channel  QMF  banks  require  the  use  of  FIR  analysis  and 
synthesis  filters.  Although  they  are  less  expensive  and  yield  superior  stopband  character¬ 
istics,  perfect  linear-phase  cannot  be  achieved  with  stable  HR  filters.  Thus,  HR  designs 
usually  incorporate  a  postprocessing  equalizer  which  is  optimized  to  reduce  the  phase 
distortion  of  the  entire  filter  bank.  However  the  analysis  and  synthesis  filters  of  such  a 
HR  filter  bank  are  not  linear-phase.  In  this  paper,  a  computationally  simple  method  to 
obtain  HR  analysis  and  synthesis  filters  that  possess  negligible  phase  distortion  is  pre¬ 
sented.  The  method  is  based  on  first  applying  the  balanced  reduction  procedure  to  obtain 
nearly  all-pass  HR  polyphase  components  and  then  approximating  these  with  perfect  all- 
peiss  HR  polyphase  components.  The  resulting  HR  designs  already  have  only  negligible 
phase  distortion.  However,  if  required,  further  improvement  may  be  achieved  through 
optimization  of  the  filter  parameters.  For  this  purpose,  a  suitable  objective  function  is 
presented.  Bounds  for  the  magnitude  and  phase  errors  of  the  designs  are  also  derived. 
Design  examples  indicate  that  the  derived  HR  filter  banks  are  more  efiicient  in  terms  of 
computational  complexity  than  the  FIR  prototypes.  Simulations  show  that  the  HR  filters 
perform  better  than  FIR  Perfect  Reconstruction  systems  under  coefficient  quantization. 

1  Introduction 

A  mcLximally  decimated  [1]  two-channel  Quadrature  Mirror  Filter  (QMF)  bank  is  shown  in  Fig,  1, 
where  Hq{z)^  Hi{z)  are  the  transfer  functions  of  the  analysis  filters  and  Fo{z)^Fi{z)  are  the  synthesis 
filters  [2],  The  analysis  bank  separates  the  signal  into  two  half- band  signals  and  the  synthesis  bank 
reconstructs  the  signal  from  the  two  half-band  signals.  To  achieve  good  separation  between  the  two 
half-band  signals  the  stop-band  energies  of  the  two  analysis  filters  have  to  be  minimized.  The  amount 
of  stopband  energy  that  can  be  tolerated  will  depend  upon  the  application.  For  instance  in  subband 
coding  of  speech  [4],  the  spectral  energy  across  a  frequency  range  of  300  —  3000  Hz  may  exhibit  a 
difference  of  40  dB.  Hence  the  analysis  filters  must  have  a  stopband  attenuation  of  atleast  40  dB, 
However  with  a  frequency  range  of  100  —  6900  Hz,  the  required  stopband  attenuation  may  be  as  much 
as  60  —  70  dB. 

The  reconstructed  signal  in  general  suffers  from  aliasing  distortion  (ALD),  amplitude  distortion  (AMD) 
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and  phase  distortion  (PHD)  due  to  the  fact  that  the  analysis  and  synthesis  filters  are  not  ideal.  Hence 
a  common  requirement  in  most  appUcations  is  that  the  reconstructed  signal,  x{n)  be  as  close  to  a;(n)  as 
possible.  However  other  constraints  are  usually  imposed  to  reduce  nonlinear  distortions,  such  as  coding 
errors  and  transmission  chaimel  distortions,  that  cannot  be  directly  evaluated.  One  such  constraint 
that  is  usually  imposed  is  that  the  analysis/synthesis  filters  be  linear-phase  [2].  In  particular,  such  a 
constraint  is  typically  imposed  in  digital  audio  apphcations. 


Several  techniques  for  the  design  of  linear  phase  finite  impulse  response  (FIR)  filters  which  elimi¬ 
nate  ALD  and  PHD  while  minimizing  AMD  have  been  reported  [4]  [5].  We  shall  call  these  Type  I 
systems.  We  shall  show  that  for  a  Type  I  system,  to  eliminate  AMD  completely,  it  is  required  that 
the  polyphase  components  must  be  all-pass  (AP).  However  since  it  is  not  possible  to  have  FIR  AP 
filters  (excepting  the  trivial  case  of  a  delay),  it  is  only  possible  to  minimize  AMD  by  optimization. 
Techniques  for  the  design  of  FIR  filters  which  have  the  perfect  reconstruction  (PR)  property,  where  all 
ALD,  AMD  and  PHD  are  eliminated  have  also  been  reported  [6][7][8].  It  is  also  possible  to  incorporate 
the  linear-phase  property  into  these  filters  [9].  We  shall  call  such  a  PR  system  with  the  linear-phase 
property  a  Type  II  system.  However  the  stop-band  energy  of  a  Type  II  filter  is  much  greater  than  that 
of  a  Typel  filter  of  the  same  length.  It  turns  out  that  to  achieve  the  same  stop-band  energy  with  a 
Type  II  filter,  it  would  require  approximately  twice  the  length  of  a  Type  I  filter.  Hence  the  group  delay 
of  a  Type  II  system  is  approximately  twice  that  of  a  comparable  Type  I  system.  The  number  of  mul^ 
tiplications  per  unit  time  (MPU’s)  for  a  Type  II  system  (of  twice  the  length  of  a  Type  I  system)  is  th* 
same  as  that  for  a  Type  I  system,  but  the  number  of  additions  per  unit  time  (APU’s)  are  much  more  [9]. 

Although  structurally  robust  Type  II  systems  can  be  implemented  using  the  two- multiplier  lattice 
structures  proposed  in  [9],  the  computational  complexity  of  these  are  twice  that  of  the  most  efficient 
one-multiplier  lattice  implementation  proposed  in  [3].  The  two-multiplier  lattice  structures  are  robust 
due  to  the  fact  that  the  analysis  and  synthesis  banks  have  coefficients  with  the  same  magnitude  but 
opposite  sign.  However  the  one-multiplier  lattice  structures  are  not  structurally  robust  PR  systems 
since  the  coefficients  in  the  analysis  and  synthesis  banks  are  different.  It  turns  out  that  the  Type  II 
systems,  when  implemented  with  the  one-multiplier  lattice  structure,  are  highly  sensitive  to  coefficient 
quantization  as  illustrated  by  the  examples  in  section  5. 


It  is  well  known  that  an  infinite  impulse  response  (HR)  filter  which  has  the  same  stop-band  energy  as 
that  of  a  FIR  filter  will  be  of  much  lower  order  and  hence  is  less  expensive  and  more  efficient  com¬ 
putationally.  However  HR  filters  are  seldom  used  for  filter  bank  systems  due  to  the  fact  that  they 
inherently  produce  phase  distortion.  Several  HR  filter  banks  which  have  no  ALD  and  AMD  have  been 
reported  [11][12][13][14].  The  PHD  is  minimized  by  using  a  separate  AP  equalizer  network  once  the 
signal  is  reconstructed  [2].  Hence  these  HR  filter  banks  are  not  suitable  for  applications  such  as  dig¬ 
ital  audio,  where  the  linear-phase  property  of  the  analysis  and  synthesis  filters  are  desired.  In  [10]  a 
method  based  on  the  eigenfilter  approach  has  been  proposed  for  the  design  of  HR  filter  banks  with  a4 
approximately  linear  phase  response  in  the  pass  band  of  the  analysis  and  synthesis  banks.  Although 
this  method  yields  filters  with  very  low  stopband  energy,  the  overall  PHD  is  not  minimized  and  hence 
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the  filters  exhibit  a  high  amount  of  PHD.  An  alternative  approach  to  design  approximately  linear  phase 
HR  filter  banks  that  has  not  been  investigated  so  far  is  to  ehminate  ALD  and  AMD  and  minimize  PHD. 

In  this  paper,  a  computationally  and  numerically  efficient  method  for  the  design  of  HR  filter  banks, 
where  AMD  and  ALD  are  eliminated  and  PHD  is  minimized  is  presented.  Each  of  the  analysis  and 
synthesis  filters  are  designed  to  have  negligible  phase  distortion.  The  method  which  is  based  on  ap¬ 
proximating  a  FIR  filter  with  an  HR  filter  consists  of  the  following  steps. 

1.  Design  a  suitable  linear  phase  FIR  prototype  filter.  Such  a  filter  must  have  polyphase  components 
which  are  approximately  AP. 

2.  Obtain  an  HR  filter  bank  having  the  same  magnitude  and  phase  responses  (approximately)  as 
that  of  the  FIR  prototypes  by  the  application  of  the  balanced  reduction  (BR)  procedure. 

3.  Optimize  the  parameters  of  the  HR  filter  bank  to  get  the  best  HR  approximation.  Here,  the  main 
objective  is  to  reduce  the  phase  distortion  as  far  as  possible. 

At  this  point,  we  must  mention  that  it  is  possible  to  do  a  direct  optimization  without  using  the  BR 
procedure.  However  this  method  turns  out  to  be  computationally  inefficient  compared  to  the  proposed 
method.  A  comparison  of  the  time  taken  for  the  two  methods  is  given  in  sections. 

The  organization  of  the  paper  is  as  follows.  In  section  2,  a  brief  overview  of  the  design  procedure 
of  a  suitable  FIR  prototype  filter  bank  is  presented.  In  sections,  a  brief  review  of  the  BR  algorithm 
together  with  some  new  results  apphcable  to  the  task  at  hand  are  discussed.  In  section  4,  a  suitable 
objective  function  and  its  application  for  the  optimization  procedure  are  presented.  Finally,  in  sec¬ 
tion  5,  application  examples,  effects  of  coefficient  quantization  and  roundoff  noise,  and  a  compaurison 
with  FIR  filter  banks  are  given. 

2  Design  of  Linear-Phase  FIR  Filter  Banks 

In  HR  filter  banks,  it  is  desirable  to  have  the  polyphase  components  as  AP  filters,  since  with  this 
choice,  AMD  is  eliminated  [1]  (see  also  equation  (5)).  Moreover,  AP  filters  can  be  implemented  very 
efficiently.  With  this  in  mind,  the  most  suitable  FIR  prototype  filter  bank  is  the  Type  I  FIR  filter  bank. 
As  we  shall  show,  for  this  class  of  filters,  the  polyphase  components  turn  out  to  be  nearly  AP.  This 
observation  is  crucial  in  approximating  the  prototype  filter  with  true  AP  filters  with  no  appreciable 
error.  In  general  the  polyphase  components  of  Type  II  systems  are  not  approximately  AP. 

The  design  of  Type  I  FIR  filter  banks  involves  the  minimization  of  AMD  by  optimizing  the  filter 
coefficients.  Such  optimization  has  been  done  by  Johnston  [4],  and  Jain  and  Crochiere  [5].  We  discuss 
briefly  the  method  due  to  Johnston  and  show  that  the  polyphase  components  are  approximately  AP. 

For  the  Alter  bank  in  Fig.  1,  the  reconstructed  signal  is  given  by 

X{z)  =  ^[Ho{z)Fo{z)  +  Hi{z)Fi{z)]X{z)  +  ^[Ho{-z)Fo{z)  +  Hi{-z)Fi{z)]X{-z)  (1) 
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The  second  term  represents  the  ALD  which  must  be  eliminated, 
the  filter  Ho{z)  and  we  choose 


Also  in  this  method  we 


design  only 


Hi{z)=Ho{-z)  (2) 

Fq{z)  -  Ho{z)  and  Fi{z)  ^ -Hq{-z)  (3) 

The  choice  of  (2)  ensures  that  Hi{z)  is  high-pass  if  Hq{z)  is  low-pass.  The  choice  of  (3)  ensures  that 
ALD  is  eliminated  and  also  efficient  implementation  of  the  filter  bank  is  facilitated.  With  this  choice, 
if  we  force  Ho{z)  to  be  linear-phase  then  all  the  filters  will  be  linear-phase  and  thus  eliminate  PHD. 

We  can  represent  i/o(z)  in  terms  of  its  polyphase  components  [2]  as 

Hoiz)=Eo{z‘^)  +  z-^Ei{z‘^)  (4) 

The  resulting  polyphase  representation  of  the  filter  bank  is  shown  in  Fig.2.  Now  with  ALD  eliminated, 
the  transfer  function  of  the  entire  filter  hank  becomes 

Tiz)  =  ^  =  2z-^Eoiz'^)Eaz^)  (5) 

Since  Ho{z)  is  Hnear  phase  it  has  to  be  of  even  length  [1].  Hence  it  can  be  easily  verified  that 

lEoie^ni  =  (6) 

Hence,  to  completely  eliminate  AMD,  we  must  have, 

\EoieP^)\  =  \Ei{e^^)\^k  (7) 

where  A:  is  a  real  constant.  In  particular,  we  choose  k  =  ^  so  that  Ho{l)  =  1.  Keeping  Eo{z)  and 
Ei(z)  to  be  FIR,  due  to  the  lack  of  AP  property,  this  condition  cannot  be  satisfied.  In  a  typical  design 
of  Ho(z)  we  first  force  it  to  be  hnear-phase  and  then  optimize  the  parameters  such  that  (7)  holds 
approximately  and  the  stop  band  energy  of  JIo(^)  is  minimum.  The  objective  function  that  has  been 
used  for  this  is  [4] 

/•TT  /•7r/2 

a  \Ho{e^n\^duj  +  [\Ho{e’‘^)\‘^  +  \Hi{e^‘-)f-lfdco  (8) 

JuJs  •/O 

where  cVs  is  the  stop  band  edge.  It  can  be  shown  that  due  to  (6)  ,  the  objective  function  can  be 
simplified  to 

a  r  ~  (9) 

Jo 

Hence  the  filter  parameters  of  the  FIR  filter  are  optimized  such  that  (7)  holds  approximately. 

Therefore  for  the  optimized  filter,  although  (7)  does  not  hold,  the  following  is  true. 

\EoW‘^)\  =  \E,{^^)\^'^  (10)0 

This  implies  that  the  polyphase  components  of  the  FIR  filter  bank,  Eq{z)  and  Ei{z)  are  only  approxi¬ 
mately  AP.  Note  that  due  to  (2)  and  (3),  in  the  design  of  the  FIR  prototype  filter  bank  we  need  only 
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to  design  Hq{z), 


In  what  follows,  we  shall  approximate  the  FIR  prototype  filter  Hq{z)  obtained  as  above  with  an 
HR  filter,  Hq{z).  The  polyphase  components  of  Ho{z)  will  be  AP.  The  HR  filter  bank  is  designed  as  a 
QMF  bank.  Hence  the  rest  of  the  filters  in  the  filter  bank  are  obtained  by  relationships  similar  to  (2) 
and  (3).  Hence  AMD  and  ALD  are  eliminated.  As  we  shall  show,  the  approximation  procedure  ensures 
that  the  HR  filter  Hq{z)  is  approximately  linear  phase  in  the  entire  baseband  (0  <  w  <  tt)  provided 
that  the  FIR  filter  Hq{z)  has  the  linear  phase  property.  Due  to  this  we  have  shown  [15]  that  PHD  is 
also  negligible. 

3  The  Balanced  Reduction  Procedure 

In  this  section  we  discuss  the  Balanced  Reduction  (BR)  procedmre.  As  proposed  by  Moore  [16],  this  is 
a  very  attractive  procedure  to  derive  a  reduced  order  model  from  a  given  high  order  system.  In  this 
method,  the  given  state  space  (s.s)  formulation  is  transformed  into  a  coordinate  system  wherein  each 
state  is  as  reachable  as  it  is  observable.  This  transformed  system  is  called  balanced^  and  by  deleting  the 
least  reachable  and  observable  states,  a  reduced  model  of  the  original  results.  For  the  piurpose  at  hand, 
since  the  higher-order  system  is  nearly  AP,  the  lower  order  subsystem  must  also  be  nearly  AP.  We  shall 
show  that  with  the  BR  procedure,  the  nearly  AP  nature  of  the  reduced  order  system  enables  us  to  find 
this  dominant  subsystem  easily.  Frequency  error  bounds  for  the  approximation  are  also  presented. 


3.1  The  Balanced  Realization 

The  state-space  realization  of  a  system  is  not  unique.  The  realization  is  minimal  if  it  is  both  reachable 
and  observable.  Let  (A,  5,  C,  D)  be  a  minimal  s.s  realization  of  a  stable  transfer  function  H  {z)  of  order 
m.  The  two  positive  definite  matrices  P  and  Q,  which  are  called  the  reachabihty  and  observability 


gramians  are  defined  as, 

F{z)F*{z)—  (11) 

2-KJ  y|z|=:l  2: 

G*{z)G{z)-  (12) 

2-kj  J\z\=i  z 

where  F{z)  =  [zl  -  A]-^B  (13) 

and  G{z)  =  C[zl  —  A]“*  (14) 

P  and  Q  can  be  found  by  solving  the  pair  of  Lyapunov  equations  given  by 

P  -  APA^  =  BB^  (15) 

Q  -  A^QA  =  C'^C  (16) 


These  can  be  solved  efficiently  using  the  Bartels-Stewart  algorithm  [17].  The  Hankel  singular  values  of 
the  system  are  defined  as  the  eigenvalues  of  the  positive  definite  matrix  PQ.  A  non-singular  similarity 
transformation  T  of  the  state  variables  yields  the  similar  system  {A,  B,  C,  D)  where 

A  =  TAT-\  B  =  TB,  C^CT-^  (17) 

The  transfer  function  and  its  Hankel  singular  values  are  invariant  under  a  nonsingular  similarity  trans¬ 
formation.  It  is  well  known  [16]  that  there  exists  a  non-singular  matrix  T  such  that  the  similar  system 
(A,  B,C,D)  obtained  as  in  (17)  is  a  balanced  realization  in  the  sense  that  the  corresponding  gramians 
are  diagonal  and  identical,  that  is  the  reachability  and  observability  gramians,  P  and  Q,  of  the  system 
(A,  B,  C,  D)  taJces  the  form 

Q  =  P  =  'B  =  diag{au<y2,...,<7m)  (18) 

Since  the  hankel  singular  values  are  invariant  imder  a  similarity  transformation  crj  >  >  . . .  > 

are  the  eigenvalues  of  PQ.  Any  realization  satisfying  (18)  is  called  a  balanced  realization  oi  H{z).  It 
must  be  noted  that  efficient  and  numerically  stable  methods  to  compute  the  balanced  realization  are 
available  [18].  For  example  ,  see  the  routine  ‘dbalreal’  in  the  MATLAB  control  system  toolbox  [20]. 


3.2  The  Balanced  Approximation  of  the  Polyphase  Components 


The  first  step  in  the  BR  procedure  is  to  obtain  a  balanced  realization  of  H{z).  The  key  to  the  reductio 
procedure  is  the  matrix  S.  Let  S  be  decomposed  into  two  parts; 


E  = 


El  0 

0  E2 


Ej  —  diagicTi,  <72,  •  •  • ,  <7,^) 
E2  diog ^  <7r-)-2)  •  •  •  )  ^m) 


(19) 

(20) 
(21) 


We  can  represent  the  balanced  realization  according  to  the  partition  (19)  as 


A  = 

411 

Ai2 

,  B  = 

'  A  ' 

A21 

A22 

.  -^2  . 

C=  [  Cl  C2  ] 


(22) 


If<7,f,  ^  <7rh+i,  then  (All,  Bi,  Cl, C)  represents  the  most  observable  and  reachable  part  of  (A,  J5,  C, B). 
The  system  (An ,  Bi,  Ci ,  D)  is  called  the  balanced  approximation  (BA)  of  the  original  system  (A,  B,  C,  D) 
The  balanced  approximation  represents  a  good  lower-order  approximation  of  the  original  system  if 
<7m  >  In  fact,  the  following  result  is  well  known  [21].  If  H{z)  is  the  transfer  function  of  the 

higher  order  system  {A,B,C,D)  and  B{z)  is  the  transfer  function  of  the  BA  (Aii,Bi, Ci,B),  the 
frequency  response  error  is  bounded  as 


m 

\\H{z)-H{z)\loo<2  ^  <71 


where  ||.||oo  =  and  is  the  field  of  real  numbers.  Thus  if  <7,?,  <^m+i  we  can  expect  a 

good  approximation.  It  is  also  true  that  if  H{z)  is  stable  then  H{z)  is  stable. 
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The  application  of  the  BR  procedure  to  obtain  a  reduced  order  system  from  a  higher  order  one  is 
illustrated  by  the  flowchart  below. 


Flowchart  for  balanced  reduction  procedure. 


For  the  task  at  hand,  we  have  a  high  order  approximately  AP  FIR  filter  and  our  aim  is  to  find  a  reduced 
order  HR  model  which  is  nearly  AP.  With  this  in  mind  we  prove  the  following  theorem: 

Theorem  3.1  Given  a  discrete-time  state  space  realization  (A,jB,(7)  which  is  balanced,  its  Hankel 
singular  values  are  all  equal  (PQ  =  o^I)  if  and  only  if  we  can  find  a  D  such  that  the  transfer  function 
of{A,B,C,D),  H{z)  is  AP,  Le  \H{ei^)\^  = 


Proof: 

Let  the  discrete  time  system  have  a  s,s  formulation  (A,  C7).  Let  us  use  the  bilinear  transform 

2  =  firf}  to  obtain  the  continuous  time  system  Hc{s)  with  s.s  formulation  {Ac,B^,Cc)  such  that 

‘U>)  = 

Then  the  proof  follows  from  the  following  facts. 

1,  The  theorem  is  true  for  continuous  time  systems  [22]. 

2.  The  Hankel  singular  values  of  the  discrete  time  system  iA,B,C)  are  equal  to  that  of  the  contin¬ 
uous  time  system  {Ac,Bc,Cc)  [22]. 
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3.  We  can  find  a  D  such  that  {A,  B,  C,  D)  is  AP  if  and  only  if  there  is  a  Dc  such  that  {Ac,  Be,  Cc,  Dc) 
is  AP.  This  follows  from  the  fact  that  {A,  B,  C,  D)  is  AP  if  and  only  if  {Ac,  Be,  Cc,  Dc)  is  AP. 


In  [19]  it  has  been  proved  that  an  orthogonal  realization  {P  =  Q  —  I)  exists  for  an  allpass  filter. 
However  whether  the  converse  is  always  true  is  not  estabhshed.  Now  according  to  Theorem  3.1,  since 
the  system  is  nearly  AP,  we  can  expect  the  singular  values  to  be  separated  into  two  clusters;  one  having 
high  and  approximately  equal  values  and  the  other  having  relatively  small  values.  There  must  always 
be  a  large  separation  of  the  singular  values  since  otherwise  the  system  cannot  be  nearly  AP.  Hence  is 
always  a  partition  as  in  (19)  such  that  Om  >  and  the  best  way  to  truncate  is  according  to  (19) 
and  this  will  give  us  the  HR  filter  which  is  closest  to  an  AP. 

If  a  transfer  function  is  AP,  then  the  numerator  polynomial  is  a  mirror  image  of  the  denominator 
polynomial.  Therefore  if  an  HR  filter  is  nearly  AP  then  the  numerator  polynomial  is  nearly  a  mirror 
image  of  the  denominator.  Now  since  we  are  looking  for  AP  polyphase  components,  we  must  force  the 
nearly  AP  transfer  function  to  be  AP.  Hence  we  choose  the  numerator  polynomial  as  the  mirror  image 
of  the  denominator  polynomial.  Alternatively  we  could  choose  the  denominator  as  the  mirror  image  of 
the  numerator.  However  we  are  not  guaranteed  that  the  mirror  image  of  the  numerator  polynomial  is 
stable,  whereas  it  is  guaranteed  that  the  denominator  polynomial  is  stable. 

3.3  Frequency  Error  Bounds 

We  shall  use  the  result  in  (23)  to  obtain  bounds  for  the  magnitude  response  error  and  the  phase 
response  error  when  the  FIR  prototype  filter  is  approximated  with  an  HR  filter  with  AP  polyphase 
components.  This  will  show  that  the  phase  response  error  of  the  HR  filter  is  negligible. 

Let  the  prototype  FIR  filter  designed  as  in  section  2,  have  a  polyphase  decomposition 

H{z)  =  0.5[Ao(2:^)  +  z~^Xi{z^)\  (24) 

where  H{z)  is  linear  phase  and  |Xo(e-'^)|  =  |Xi(e^^)|  ss  1.  Let  us  approximate  H{z)  with 

H{z)  =  0.5[Ao(z2)  +  z~Ui{z^)]  (25) 

where  Aq{z)  and  Ai{z)  are  AP. 

Theorem  3.2  The  frequency  and  phase  response  errors  of  the  approximation  are  bounded  as 

\\H{eP^)  -  H{e;in\\oo  <  ^||^o(e^'^)  -  Ao{ei^)\\oo  +  ^||Xi(e^'2-)  -  Ai(e^'2-)|U  (26) 

U{u)-^{u;)\\^  <  \\Xo{e^^n  -  Ao(e^'^)||oo  +  || Ai(e^'2-)  -  Ai(e^'2-)||^  (27) 

provided  that  each  of  the  quantities  on  the  r.h.s  are  <?C  1. 
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Proof: 

The  claim  in  (26)  can  be  easily  verihed.  Now  due  to  (10)  we  have  \Xq{€P‘^)\  =  \Xi{e^^)\.  Therefore 
if  the  arguments  of  Xo{ej^)  and  Xi{ej^)  are  ^o(w)  and  (f>i{u))  respectively  we  have 


=  |Xo(e''^‘^)|cos(^~ 

2 

Hence  the  phase  of  is  given  by 

2 

similarly  if  the  arguments  of  s-nd  are  <^o(^)  3Jid  <^i(ci;)  respectively  we  have 

=  cos(-‘^°  ~  ^)e(^o+<^i-^)/2 

2 

Hence  the  phase  of  is  given  by 


(28) 

(29) 


(30) 


(31) 


^{uj)  = 


<i>o  +  4>i 


(32) 


-  ^{iv)\  =  0.5|(.^o-<^o)  +  (-^i--^i)| 

<  0.5(|<A,-<^|  +  |^i-.^i)|) 


(33) 

(34) 


Now  |<^o(c^)  -  <^o(w)| 


(Xo{ei^-)\Ao(ei^‘^)\\ 
\Ao{ej^)\Xo{ej^‘^)\) 
|ln(l  +  5o(a;))| 

|(Jo(w)|  when  |5o|  <  1 


where 


l^o(a;)| 


^  _  Xo(^^n\Ao(^^ 

Ao{ej^)\Xo{ej^^)\ 

|Xo(ei2<^)|  -  \Ao{ei^-)\ 
Ao{e^^) 

<  2\Ao{e^^‘^)  -  Xo{^^)\ 


Ao{e^^)  -  Xoje^^) 
Ao{e:i^) 


where  we  have  used  the  fact  that  \Ao{e^‘^)\  =  1.  Therefore,  from  (37)  and(39) 

|<^o(a;)  -  <^o(w)i  <  2\Ao{eP‘^)  -  Xq{^‘^)\ 


(35) 

(36) 

(37) 

(38) 

(39) 

(40) 


Similarly,  we  can  show  that 


|<^i(a;)  -<^i(aj)|  <  2\Ai{€P‘^‘^)  -  Xr{^'^)\ 


(41) 


Therefore,  the  claim  follows  from  (34), (40)  and  (41) 


Now  the  BA  of  Xq{z)  is  the  nearly  AP  transfer  function  Xo{^)-  We  choose  Aq{z)  to  be  the  AP  filter 
such  that  its  denominator  is  equal  to  that  of  Xo{z).  Then  it  turns  out  that  the  numerator  coeflScients  of 
Aq{z)  and  Xo{z)  are  very  close  due  to  the  approximately  AP  natmre  of  Xq{z).  In  Appendix  A  we  show 
that  the  frequency  response  error  between  Aq{z)  and  Xo(^)  is  small  provided  that  their  corresponding 
numerator  coefficients  are  approximately  equal  and  that  ^0(2:)  has  a  good  stability  margin.  Therefore 

\\Xq{z)  ~  Ao (2^)1  loo  <  €0 
||Ai(2:)  ”  Ai(2r)||oo  <  ei 

where  eo  and  61  are  small.  Prom  (23),  (42),  (43)  and  Theorem 3.2,  it  follows 

1  rui 

i=0 

1  TUi 

||(/>(a;)  —  ^(a;)||oo  (Jij -f  eo  +  €1 

1=0 

where  aij  are  the  discarded  singular  values  of  Xi  (z)  respectively. 

From  these  results,  we  conclude  that  with  the  BA  technique,  the  magnitude  response  error  and  the 
phase  response  error  of  the  HR  approximation  are  small  since  the  discarded  Hankel  singular  values  of 
the  two  approximately  AP  polyphase  components  of  the  FIR  prototype  filter  are  small. 

To  illustrate  the  application  of  the  BR  procedure  to  obtain  an  HR  allpass  filter  from  a  nearly  alh 
pass  FIR  filter  we  present  an  example. 


(42) 

(43) 


that 


(44) 

(45) 


Example  4- We  start  with  an  approximately  allpass  FIR  filter  given  by 

H{z)  =  -0.0076  -  0.00542''^  +  0.1769z"2  +  0.9688^“^  -  0.1694z“^  +  0.0377^“^ 


This  filter  is  2Eo{z)  (i.e.,  the  0^^  polyphase  component  with  the  magnitude  normalized)  of  the  Johnston 
12A  filter  tabulated  in  [4].  The  canonical  s.s  realization  of  this  FIR  filter  is  given  in  Table  1. 


Filter 

A 

B 

C 

D 

0.000000 

0.000000 

0,000000 

0.000000 

0.000000 

1.000000 

-0.005421 

-0.007619 

1.000000 

0.000000 

0.000000 

0.000000 

0.000000 

0.000000 

0.176940 

FIR 

0.000000 

1.000000 

0,000000 

0.000000 

0.000000 

0.000000 

0.968779 

0.000000 

0.000000 

1.000000 

0.000000 

0.000000 

0.000000 

-0.169391 

0.000000 

0.000000 

0.000000 

1.000000 

0.000000 

0.000000 

0.037713 

-0.092316 

-0.617899 

0.327138 

-0.709002 

0.709002 

-0.007619 

IIR 

0.617899 

0,565276 

0.369593 

-0.402579 

-0.402579 

-0.327138 

0.369593 

-0.648984 

-0.578959 

-0.578959 

Table  I:  State  space  realizations  of  the  FIR  and  IIR  filters  in  Example  4.1. 


10 


A  balancing  transformation  that  diagonalizes  the  gramians  is 


-0.7090 

-0.4026 

-0.5790 

-0.2219 

0.2093 

0.1248 

-0.8796 

0.4589 

-0.1408 

0.9943 

0.6822 

-0.2504 

-0.6637 

5.2256 

-0.9103 

-0.1248 

0.0350 

0.1154 

28.1241 

-30.6132 

0.0267 

-0.0152 

-0.0218 

-7.2917 

-118.7535 

The  diagonal  elements  of  the  diagonalized  gramians  are 

diagY.  =  [  1.000036  1.000028  0.999998  0.001147  0.000066  ] 

It  is  seen  that  the  first  three  elements  are  close  to  1  and  the  rest  is  much  smaller.  Hence  the  order  of  the 
HR  filter  is  3.  Next  the  balanced  system  is  found  and  truncated  to  obtain  the  3’’'^  order  approximately 
allpass  HR  filter  in  'Ihble  I.  The  transfer  function  of  this  HR  filter  is 

,  _  -0.007619  -  0.006762^-1  +  0.176036^-2  +  1.0000142-3 
~  1.000000  +  0.176024Z-1  -  0.006907z-2  -  0.008608z-3 

which  is  approximately  allpass.  Hence  according  to  the  proposed  method  by  choosing  the  numerator 
as  the  mirror  image  of  the  denominator  the  allpass  filter  is 

.  .  _  -0.008608  -  0.006907Z-1  +0.176024^-2  +  l.OOOOOOz-3 
~  1.000000  +  0.176024Z-1  -  0.006907z-2  -  0.008608z-3 

The  frequency  response  error  bound  for  the  BR  is 

\\H{eP'^)  -  H(e^‘^)l|oo  <  0.004852 
Hence  the  phase  error  bound  using  (40)  is 

\\4>h{^)  —  </'a(‘^)||oo  <  (0.004852  +  e)  rad 

Where  e  is  a  small  quantity.  The  actual  phase  error  between  and  H{e^^)  is  0.002130  rad  and 

that  between  and  is 

\\4>h{‘^)  —  <^a(‘^)||oo  =  0.003138  rad 

This  shows  that  the  error  bound  gives  a  good  estimate  of  the  error  that  can  be  expected. 

The  balanced  approximation  is  not  the  optimal  lower  order  approximation,  but  we  choose  it  because  of 
its  computational  advantage.  Also  the  question  arises  as  to  whether  the  optimal  AP  approximation  to 
a  nearly  AP  HR  filter  is  obtained  by  choosing  the  numerator  polynomial  to  be  the  mirror  image  of  the 
denominator  polynomial.  The  only  reason  for  this  choice  is  because  the  BA  is  guaranteed  to  provide  a 
stable  transfer  function,  the  denominator  polynomial  must  be  stable.  Another  choice  is  to  choose  the 
denominator  polynomial  to  be  the  mirror  image  of  the  numerator.  However  in  this  instance  we  cannot 
guarantee  that  the  resulting  filter  will  be  stable.  Due  to  these  two  facts  we  cannot  be  sure  whether 
this  gives  the  AP  filter  which  is  closest  to  the  original.  Hence  we  optimize  the  coefficients  of  the  filter 
to  obtain  an  optimal  filter. 
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4  Optimization  of  the  Filter  Parameters 


We  shall  now  use  the  filter  parameters  obtained  by  the  application  of  the  BR  procedure  as  initial  values 
and  find  the  optimum  parameters  for  the  filter.  In  the  design  of  HR  digital  filters  the  most  popular 
method  is  the  Fletcher-Powell  optimization  procedure  [23].  This  algorithm  has  several  advantages. 
Only  first  derivative  information  of  the  objective  function  is  required,  which  in  this  case  is  readily 
obtained  as  we  shall  show.  It  has  been  used  with  success  by  many  people  [24]  [25],  demonstrating  its 
good  convergence  properties.  A  programmed  version  of  the  algorithm  is  available  in  the  MATLAB 
optimization  toolbox.  This  program  allows  boimds  to  be  defined  on  the  filter  parameters,  which  is 
useful  to  ensure  that  the  filter  remains  stable. 


We  shall  not  discuss  the  optimization  procedure,  but  merely  state  that  the  rate  of  convergence  of  the 
method  depends  on  the  accmacy  of  the  first  derivative  information  of  the  objective  function.  Hence, 
although  it  is  possible  to  compute  the  first  derivatives  numerically,  we  shall  use  analytical  expressions. 

4.1  The  Form  of  the  Filter  Transfer  Function 

The  filter  transfer  function  takes  the  form  in  equation  (31).  The  optimization  is  done  for  the  filter 
Hq{z)  and  not  for  the  AP  polyphase  components.  The  order  of  these  two  AP  filters  are  the  same  as 
the  order  of  the  AP  filters  obtained  by  applying  the  BR  procedure  to  the  polyphase  components.  Le^^^ 
each  Ai(z)  obtained  by  applying  the  BR  procedure  have  (mi  +  2ni)  poles  of  which  2ni  are  comple!^P 
pairs.  Then  the  AP  filters  take  the  form. 


^d^)=n 


(1  4-  2- 


*;  =  n,+l  '  ■' 


(46) 


We  choose  this  cascade  form  for  the  polyphase  components  due  to  several  reasons.  Firstly,  stability  of 
a  cascade  filter  is  readily  tested.  This  is  important  since  during  the  process  of  optimization,  we  have 
to  ensure  that  the  filter  is  stable.  Secondly,  errors  due  to  quantization  and  finite  wordlength  size  are 
much  less  relative  to  other  forms  [26].  Finally,  the  magnitude  response  and  the  group  delay  of  Ho{z) 
takes  a  particularly  simple  functional  form,  permitting  easy  calculation  of  the  first  derivatives. 

4.2  The  Objective  Function 

Our  objective  in  the  optimization  procedure  is  to  minimize  the  magnitude  response  error  and  the  group 
delay  distortion.  Hence  our  objective  function  is 


/(x)  =7^(Mo(e^‘^)  -  Mo(e^'‘^))^da;  +  {I  -  j)  ^ {f  -  doj  0<7<1 


(47) 


where  x  =  (roi,ro2, .  -  •  ,romoj ^111^12,  •  •  •  ^01,9^2,  •  •  •  ,(j>anoAnA\2,  ■  ■  -Aini),  Mo{e^^)  and  Mo(e^‘^) 

are  differentiable  magnitude  response  functions  of  the  prototype  FIR  filter  and  the  HR  filter  respec¬ 
tively  and  T,  f  are  the  respective  group  delays.  A  differentiable  magnitude  response  function  of  7/o(e^“| 
is  readily  obtained  from  (31)  as 


=  cos(q:i  —  ao  —  ^/2) 


(48) 


where 


-EE— E 

j — u  K  —  L  A:=n.;+1  '  ' 

and  the  group  delay  is  given  by 


)oj 


(49) 


1 

f = 0.5 + 

i=0 


yy - IziLh _ 

^  1  +  2ri,t  co3(2a)  +  (-iy<pi,k)  +  r?  ^ 


1-r?, 

_ t^K 


^  1  +  2ri  k  cos(2ct;)  -f  r?  . 

:=ni  +  l  *'* 


(50) 


For  completeness  we  give  the  partial  derivatives  of  /(x).  The  partial  derivatives  of  the  objective 
function  /(x)  w.r.t  x,  where  a;  is  an  element  of  x  will  be 


—  ^7 ^i^o{z)  —Mo{z))~  +  2{l  - '))  j>{T  —  T)^dz 

where  if  i  =  0, 1  and  j  =  1, 2, . . . ,  nj 


(51) 


d<f)ij 

dMojz) 

drij 

dr 


drij 

df 


(-1)*  sin(ai  -  do  -  w/2)  y](-l)*rjj 

A:=0 


cos(2a;  +  {-l)^4>ij)  +  nj 


1  +  2rij  cos(2cj  +  (— 

sm(a,  -  <«  -  w/2)  f - sm(2^  +  (-l)Vv) _ _ 

^  1+  2rij  cos(2w  +  (-!)'=  (/>y)  +  rfj 

^  ^  cos(2a;  +  (pij)  +  2rij  +  r?-  cos(2a;  +  (-l)^(/>ij) 


fc=0 


(1  +  2rij  cos(2a;  +  +  r?.)^ 


g  ^  (-l)*'nj(l  -  rl)  sin(2a;  +  {-l)’=4>ij) 
(1  +  2rij  cos(2a;  +  )  +  r?  )2 


and  if  z  =  0, 1  and  j  =  ni  +  1,  +  2, . . . ,  mi 


dMQ{z)  _  sin(ai  —  ao  —  Ct;/2)  sin 2a; 

5rij  1  +  2rij  cos  2a;  +  r?- 

_  —  2{cos  2a;  +  2rij  +  r^j  cos  2a;) 

drij  ( 1  +  2rij  cos  2a;  +  r?- )  2 

In  the  optimization  procedure  we  minimize  /(x)  subject  to  |rijj  <  1. 


(52) 

(53) 

(54) 

(55) 

(56) 

(57) 


5  Design  Examples 

In  this  section,  we  demonstrate  the  application  of  the  proposed  method  to  design  an  HR  filter  hank 
with  each  analysis  filter  having  a  stop  band  edge  a;*  =  0.586  and  a  stop-band  attenuation  of  65dB. 
Results  of  the  design  of  two  other  filters  which  have  the  same  a;^,  but  different  stopband  energies  are 
also  given.  A  comparison  is  made  with  comparable  Type  I  and  Type  II  FIR  systems. 


Example  5.1:  The  design  of  FIR  filters  with  the  desired  properties  referred  to  in  this  paper  have 
been  presented  earlier  by  Johnston  [4].  We  choose  a  filter  which  satisfies  the  above  specifications  from 
the  appendix  of  [4].  This  turns  out  to  be  a  filter  of  length  64,  which  is  referred  to  as  the  64D  filter. 
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The  magnitude  response  of  this  filter  is  shown  in  Fig.  3  and  those  of  the  two  polyphase  components  is 
shown  in  Fig.  4.  This  shows  that  the  polyphase  components  are  approximately  AP. 

We  now  find  the  balanced  realization  for  each  polyphase  component.  The  largest  twenty  singular 
values  of  the  realization  are  tabulated  in  Thblell.  It  can  be  clearly  seen  that  each  polyphase  compo¬ 
nent  has  a  dominant  subsystem  which  is  AP.  For  the  polyphase  component,  the  order  of  the  AP 
subsystem  is  16,  while  that  for  the  polyphase  component  is  15.  However  we  must  mention  that 
these  singular  values  are  not  exactly  equal.  There  axe  differences  of  the  order  of  10“^  in  these  singular 
values.  Nevertheless,  these  axe  very  closely  clustered  together  and  we  can  expect  it  to  be  nearly  AP. 
Now  we  truncate  the  two  polyphase  systems  and  find  the  HR  approximations.  The  two  denominator 
polynomials  of  the  truncated  systems,  do(^)  and  di(z)  axe  shown  in  'Ih.blelll.  We  choose  the  numer¬ 
ator  polynomial  of  each  polyphase  component  to  be  the  mirror  image  of  the  respective  denominator 
polynomials. 


m 

^0,m 

^l,m 

1 

0.5000 

0.5000 

2 

0.5000 

0.5000 

3 

0.5000 

0.5000 

4 

0.5000 

0.5000 

5 

0.5000 

0.5000 

6 

0.5000 

0.5000 

7 

0.5000 

0.5000 

8 

0.5000 

0.5000 

9 

0.5000 

0.5000 

10 

0.5000 

0.5000 

11 

0.5000 

0.5000 

12 

0.5000 

0.5000 

13 

0.5000 

0.5000 

14 

0.5000 

0.5000 

15 

0.5000 

0.5000 

16 

0.5000 

0.0001 

17 

0.0001 

0.0001 

18 

0.0000 

0.0000 

19 

0.0000 

0.0000 

20 

0.0000 

0.0000 

Table  II:  Largest  20  Hankel  singular  values  for  the  two  polyphase  components  in  Example  5.1. 


The  magnitude  response  and  the  group  delay  of  the  HR  filter  Ho{z)  =  Q.b[AQ{z^)  +  z~^Ai{z^)]  is  shown 
in  Fig.  5  where  the  denominator  polynomials  of  the  AP  polyphase  components  are  given  in  Table  III. 
One  can  see  that  the  approximation  error  is  very  small.  The  theoretical  error  bound  for  the  phasJl 
response  with  the  BR  procedure  given  in  (45),  neglecting  eo  and  ei,  is  7.4  x  10““^,  while  the  actual 
phase  error  for  the  AP  approximation  is  4.3  x  10“^.  This  indicates  that  the  theoretical  error  bound 
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for  the  BR  procedure  gives  a  good  estimate  of  the  actual  error.  The  computer  time  taken  for  the 
BR  procedure  is  approximately  0.64  seconds  and  1.1  x  10®  floating  point  operations  (FLOPS)  on  a 
DEC  5000  workstation  using  the  MATLAB  control  system  toolbox. 

Now  with  the  parameters  of  Ho{z)  as  the  initial  values  we  find  the  filter  Ho{z)  which  is  optimal 
in  the  sense  that  (47)  is  minimized.  The  denominator  polynomials,  do{z)  and  ^1(2;)  of  the  polyphase 
components  of  Hq{z)  are  given  in  table II  and  the  magnitude  response  and  group  delay  are  shown 
in  Fig.  6.  In  this  example  the  improvement  in  the  group  delay  distortion  is  only  marginal  indicating 
that  the  BR  procedure  by  itself  yields  nearly  optimal  filters.  The  computer  time  for  the  optimization 
requires  about  5  minutes.  We  could  achieve  the  same  results  with  a  direct  optimization  without  using 
the  BR  technique,  but  this  requires  computer  time  of  17  minutes! 

Example  5.2:  We  could  also  use  the  optimization  procedure  to  reduce  the  group  delay  distortion  at 
the  expense  of  increased  stopband  energy.  If  we  use  the  48£>  FIR  filter  as  the  prototype,  we  obtain 
an  HR  approximation  with  stop  band  attenuation  greater  than  50  dB  and  and  maximum  group  delay 
distortion  ±0.06  samples.  With  optimization  we  can  reduce  the  maximmn  groupdelay  distortion  to 
±0.0125  samples  if  a  stop  band  attenuation  of  42.5  dB  is  satisfactory.  The  denominator  coefficients  of 
this  filter  are  given  in  Table  B-I  of  the  appendix. 

Example  5.3:  The  denominator  coefficients  of  a  design  using  a  lower  order  FIR  (32D  Johnston  Fil¬ 
ter)  is  shown  in  Table  B-II  of  Appendix  B.  In  this  case  since  the  order  of  the  HR  filter  is  lower,  the 
stopband  attenuation  is  also  low  and  the  group  delay  distortion  is  also  somewhat  higher  than  the  higher 
order  designs.  In  this  case  no  appreciable  improvement  in  the  group  delay  distortion  could  be  obtained 
at  the  expense  of  stopband  attenuation. 

The  maximum  stopband  attenuation  that  can  be  achieved  is  the  stopband  attenuation  of  the  pro¬ 
totype  filter.  This  is  because  we  minimize  the  frequency  response  error  of  the  HR  with  respect  to  the 
FIR  prototype.  If  one  can  tolerate  more  group  delay  distortion  but  needs  higher  stopband  attenuation 
a  FIR  prototype  filter  has  to  be  designed  with  the  required  stopband  attenuation  while  allowing  for  a 
larger  AMD.  This  can  be  achieved  by  choosing  a  larger  weighting  factor  a  in  (8). 
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n 

doin) 

do(n) 

di{n) 

di(n) 

0 

1.00000000000000 

1.00000000000000 

1.00000000000000 

1.00000000000000 

1 

0.24510479134270 

0.24510472044679 

-0.24510331217525 

-0.24510335220548 

2 

-0.08540446106043 

-0.08540104480996 

0.14547975580225 

0.14548313728944 

3 

0.04372217148846 

0.04372495061969 

-0.10031250245546 

-0.10031129317841 

4 

-0.02473518071621 

-0.02473108389726 

0.07246314200969 

0.07246687835227 

5 

0.01413313848422 

0.01412938906084 

-0.05288462490705 

-0.05289001097297 

6 

-0.00769194123910 

-0.00768883641505 

0.03829127599374 

0.03829712692391 

7 

0.00369791714860 

0.00369704301735 

-0.02719069757361 

-0.02719472555477 

8 

-0.00128353807212 

-0.00127508069423 

0.01876589218446 

0.01877657358317 

9 

-0.00009405863941 

-0.00013315486101 

-0.01245789864033 

-0.01250200323171 

10 

0.00068111709418 

0.00065913509141 

0.00794962259068 

0.00794917507237 

11 

-0.00089067178133 

-0.00088585259755 

-0.00477740314111 

-0.00477542934039 

12 

0.00082093773822 

0.00082071701203 

0.00267592696004 

0.00267324786445 

13 

-0.00062700877872 

-0.00062822205023 

-0.00136646062031 

-0.00136132437679 

14 

0.00039360412931 

0.00039431963754 

0.00062174841299 

0.00060977797019 

15 

-0.00019341002057 

-0.00019017326905 

-0.00023563382949 

-0.00018118324745 

16 

0.00008049208892 

0.00011897794914 

- 

- 

Table  III:  Denominator  polynomial  coefficients  of  the  two  polyphase  components  in  Example  5.1. 
5.1  Effects  of  Coefficient  Quantization 

Since  a  digital  filter  is  always  implemented  with  a  finite  wordlength  (FWL),  it  is  important  to  determine 
the  performance  of  the  filter  when  the  designed  coefficients  are  quantized.  We  assume  the  filter  is  to  be 
implemented  in  a  floating  point  processor  with  a  finite  number  of  bits  in  the  mantissa.  The  following 
filters  were  implemented  in  FWL: 

•  The  64D  Johnston  Type  I  filter. 

•  The  64  length  Type  II  filter  presented  in  [9]  implemented  in  a  T-multiplier’  lattice  structure  [3]. 

•  The  HR  filters  designed  using  the  32D,  48D  and  64D  Johnston  filters  as  the  prototypes. 

A  12-bit  mantissa  was  used  on  the  implementation  of  all  three  filters.  Fig.  7  shows  the  overall  mag¬ 
nitude  response  |T(e^‘^)|  for  the  Type  I  and  Type II  filter.  For  the  HR  filter  \T{e^^)\  is  perfectly  flat 
and  for  the  Type  I  filter  it  is  almost  flat.  One  can  see  that  the  Type  II  system  is  far  from  a  PR  system 
under  coefficient  quantization.  The  Type  I  system  seems  far  better  than  the  Type  II  system.  For  Type  I 
Type  II  systems  the  groupdelay  is  perfectly  flat,  since  the  linear  phase  property  is  guaranteed  un¬ 
der  any  level  of  coefficient  quantization.  For  the  HR  filter  banks  the  groupdelay  distortion  increases 
marginally.  Fig.  8  shows  the  magnitude  response  of  the  Type  II,  Type  I  and  HR  lowpass  filters.  It  is 
clearly  seen  that  the  Type  II  system  is  extremely  sensitive  to  coefficient  quantization  effects.  The  HR 
and  Type  I  filters  have  very  low  sensitivity.  As  seen  in  Fig.  8,  even  for  a  high  order  filter  such  as  the. 
64D  filter,  there  is  no  appreciable  change  in  the  magnitude  response.  Results  are  tabulated  in  Table  IV.^ 

The  HR  filter  designed  in  [10]  with  approximate  linear  phase  characteristics  in  the  pass  band  was 
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also  implemented  in  FWL.  This  filter  has  a  stop  band  attenuation  of  71.2  dB  but  has  a  groupdelay 
distortion  of  about  +16  samples.  When  implemented  with  a  12-bit  mantissa  the  stopband  attenuation 
dropped  marginally  to  69.4  dB. 


5.2  Effects  of  Roundoff  Noise 


To  investigate  effects  of  roundoff  noise,  the  filter  banks  were  implemented  with  FWL  and  infinite 
wordlength  (IWL).  FWL  implementation  was  with  12- bits  in  the  mantissa  and  using  double  precision 
for  the  intern8il  registers  (i.e.,  intermediate  values  are  stored  as  a  floating  point  number  with  24-bits 
in  the  mantissa).  Due  to  practical  constraints,  in  the  IWL  implementation  the  number  of  bits  in  the 
mantissa  was  26.  The  input  signal  was  white  noise  with  a  flat  spectrum  and  random  phase.  The  signal 
was  passed  through  the  filterbank  and  the  signal  was  reconstructed  at  the  output.  The  signal-to-noise 
ratio  (SNR)  at  the  output  was  calculated  using 


x(n)^ 

=0 


(58) 


The  results  are  tabulated  in  Table  IV. 

For  the  HR  filter  in  [10]  the  SNR  was  only  about  5dB  (for  IWL  and  FWL).  However  when  a  fairly 
smooth  signal  (band  limited  to  the  lower  end  of  the  frequency  spectrum)  was  used  the  SNR  increased 
to  45  dB.  Hence  the  very  poor  SNR  can  be  attributed  to  the  very  large  groupdelay  distortion.  Hence 
for  this  filter  bank  too,  a  postprocessing  AP  equalizer  network  is  necessary. 


5.3  Comparison  with  FIR  filters 

An  AP  filter  of  order  p  with  real  coefficients  can  be  implemented  with  p  multipliers  and  2p  adders 
[27].  However  if  we  choose  the  structure  in  Fig.  2,  each  polyphase  component  is  computing  at  half 
rate.  Hence  each  polyphase  component  needs  only  p/2  multiplications  per  unit  time  (MPU’s)  and  p 
additions  per  unit  time  (APU’s),  where  p  is  the  order  of  that  polyphase  component.  Hence  our  analysis 
bank  in  example  5.1  needs  only  (16  +  15)/2  =  15.5  MPU’s  and  (31  *  2  +  2)/2  =  32  APU’s.  Table  IV 
gives  a  comparison  of  three  different  HR  filters  with  FIR  filters.  The  data  for  the  Type  II  filter  is  based 
on  [11].  The  three  IIR  filters  labeled  (1),  (2)  and  (3)  correspond  to  designs  based  on  the  prototype 
FIR  filters  64D,  48D  and  32D  respectively  (examples  5.1,  5.2  and  5.3). 
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Feature  Type  I  FIR  Type  II  FIR  HR 

(1)  (2)  (3) 


Distortions 

ALD  canceled 

ALD  canceled 

ALD  canceled 

in  Filter 

AMD  minimized 

AMD  eliminated 

AMD  eliminated 

Bank 

PHD  eliminated 

PHD  eliminated 

PHD  minimized 

No  of  MPU’s  for 

32 

17 

15.5 

11.5 

7.5 

Analysis  bank 

No  of  APU’s  for 

32 

49 

32 

24 

16 

Analysis  Bank 

Average  group 

delay  (samples) 

63 

63 

63 

47 

31 

Stop  Band  -  IWL 

65  dB 

42  dB 

65  dB 

53.1  dB 

37.5  dB 

Atten.  -  FWL 

68  dB 

06  dB 

66.5  dB 

53  dB 

36.9  dB 

AMD  -  IWL 

0.002  dB 

OdB 

OdB 

OdB 

OdB 

-  FWL 

0.006  dB 

3.320  dB 

OdB 

OdB 

OdB 

Group  delay 

distortion  -  IWL 

0 

0 

±0.0125 

±0.0400 

±0.0231 

(samples)  -  FWL 

0 

0 

±0.0309 

±0.0400 

±0.0579 

SNR  -  IWL 

144  dB 

179dB 

143  dB 

135  dB 

117  dB 

-  FWL 

125  dB 

21  dB 

125  dB 

124  dB 

115  dB 

Table  IV:  Comparison  of  three  different  IIR  filter  designs  with  FIR  filters. 

Based  on  the  information  in  the  Table  IV  the  IIR  is  better  than  the  Type  I  and  Type  II  filters  in  every 
respect  excepting  the  group  delay  distortion.  When  compared  with  the  Type  I  filters,  the  IIR  filter 
would  be  much  better  when  efficient  implementation  is  necessary  since  the  MPU  count  is  very  much 
less  for  the  latter  while  the  same  stopband  attenuation  is  achieved.  It  is  obvious  that  the  Type  II 
filters,  to  be  of  any  practical  use,  have  to  be  implemented  with  the  2-multiplier  lattice  structure.  Then 
the  Type  II  filter  will  have  the  same  MPU  count  as  for  the  Type  I  filter.  Hence  again  the  IIR  filter  is 
much  more  efficient  than  the  Type  II  filter.  The  price  paid  for  the  better  efficiency  is  the  small  group 
delay  distortion.  However  since  the  group  delay  is  very  small,  for  most  applications  this  is  bound  to 
be  acceptable. 


6  Conclusion 


We  have  presented  an  algorithm  to  design  IIR  filter  banks  based  on  FIR  prototypes.  The  analysis 
filters  of  the  IIR  filter  bank  have  approximately  linear-phase.  It  was  shown  that  the  Type  I  FIR  filtel 
family  was  the  most  suitable  prototype  filter.  Although  it  is  possible  to  obtain  the  IIR  filter  bank 
with  direct  optimization,  the  design  becomes  computationally  more  efficient  when  the  BR  scheme  is 
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used  to  obtain  initial  values  for  the  filter  coefficients.  Furthermore  the  partial  error  bounds  for  the  BR 
scheme,  gives  a  good  estimate  of  the  phase  error  that  can  be  achieved.  Design  examples  demonstrated 
the  application  of  the  algorithm  and  indicated  the  computational  advantage  of  the  HR  filter  bank 
compared  to  FIR  designs.  In  most  cases  the  the  BR  procedure  gives  good  HR  filters  needing  little  or 
no  optimization.  However,  optimization  could  be  used  to  obtain  lower  group  delay  distortion  at  the 
expense  of  lower  stopband  attenuation.  The  group  delay  distortion  of  the  filter  bank  is  so  small  that 
for  most  applications  an  HR  implementation  seems  to  be  better  than  an  FIR  implementation  due  to 
the  computational  advantage. 

A  further  simplification  of  the  method  is  possible  due  to  the  work  in  [28],  where,  an  algorithm  to 
obtain  a  reduced  order  HR  filter  from  a  high  order  FIR  filter,  without  explicitly  constructing  an  in¬ 
terim  balanced  realization,  has  been  presented.  This  method  is  much  simpler  than  the  conventional 
method  of  BR, 

As  indicated  by  the  simulations  with  FWL,  efficiently  implemented  Type  II  systems  lack  robustness 
under  coefficient  quantization.  Furthermore  the  stopband  characteristics  of  these  filters  degrade  un¬ 
der  coefficient  quantization  and  hence  the  2-multiplier  lattice  structure  should  be  used  to  implement 
Typell  filters.  Hence  robust  Typell  filter  banks  are  not  as  efficient  as  HR  filter  banks.  The  HR 
filters  are  not  sensitive  to  coefficient  quantization  effects  mainly  due  to  the  structurally  robust  allpass 
polyphase  structure.  The  Type  I  filters  too  exhibit  good  robustness  under  coefficient  quantization. 

In  summary  the  HR  filter  banks  have  all  the  desired  properties,  viz.,  good  stopband  attenuation, 
low  computational  complexity,  linear  phase  (approximately),  low  reconstruction  error  and  low  sensi¬ 
tivity. 

An  alternative  approach  to  design  the  proposed  HR  filter  banks  is  to  use  the  eigenfilter  method  [10] 
to  approximate  the  phase  of  the  AP  polyphase  components  with  that  of  the  prototype  filter  polyphase 
components.  This  method  yields  HR  filter  banks  which  are  nearly  identical  to  those  obtained  with  the 
proposed  method.  However  this  method  requires  more  computer  time  and  more  number  of  operations 
than  the  proposed  method. 

Further  research  problems  would  be  to  investigate  the  improvement  in  performance  of  the  filter  banks 
when  implemented  in  ^-operator  formulation  [29]  [30]  and  the  extension  of  the  proposed  method  to  the 
2-dimensional  case  using  the  BR  algorithm  presented  in  [31], 
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Appendix  A 


Let  T{z)  be  a  nearly  AP  transfer  function  with 

T(z)  =  n  ^  /  rk  +  2;-^  \ 

k=i  (1  +  rike-J^*2-i)(l  +  V ^  ^ 

where  we  assume  that  sup<.  Ir^  -  rk\  <  Ar  and  sup^  \^k  -  <i>k\  <  ^4>-  This  is  true  since  T{z)  is  nearly 
AP.  Note  that  f{z)  =T{z)\p=p  is  AP,  where p=  {pi,P2,  •  •  •  ,Pm)  andp  =  (pi,p2,  •  •  •  ,Pm),  Pit  -  {rk,4>k), 
Pk  =  (rjt,  i>k),  The  group  delay  is  given  by 


t  =  0.5 


n  1  / 

EE  r 

fc=lj=0 


1  -f2 


+ 


i-rL 


+  2fk  cos(a;  +  (-1)-?^^)  +  '  1  +  2rfc  cos(cc;  +  +  r| 


+ 


S,  r 


i-fl 


^9  + 


1-iL 


A:=:n-f  1 


Hence  we  can  write 


A  ^ 


Arjt  +  2_) 

p=p  fc=l 

where  the  higher  order  terms  have  been  neglected.  Therefore 


+  2ffc  cos(a;)  +  1  +  2rk  cos{co)  +  ^ 

A4>k 


p=p 


A:=l 


dr 


dru 


p=p 


k=l 


dr 


d<f>k 


|A<^d 


p=p 


(60) 

(61) 


(62) 


Now 

But  if 

then 

and 


Qt  _ _ cos(a;  +  (—1)^0^)  +  2rfc  +  cos(a;  +  ( — l)-^<^fc) 

drk  ^  [1  +  2rfc  cos(a;  +  (-l)i(^fc)  +  r|]2 

_ ,  .  cos  a  +  2r  +  cos  a 

CJyOtj  ■'  2 

1  +  zr  cos  a  + 

[C'(a)]„^a^  =  C(0)  =  1 

[C'(Q:)]min  =  C{n)  =  -1 


So  1^7(0:)  I  <  1.  Hence 


dr 

drk 


< 


1 


E 


_ 1 _ 

1  +  2rk  cos(uj  +  (-ly^k)  +  r^ 


Next  consider 

d'T  ^  Q  _  J.2-.  ^  (-1)^  +  Vfcsin(a;+  (-l)J>fc) 

d<^k  [1  +  2rk  cos(cli  +  (-iy<f>k)  + 


But  if 
then 


sin  a 


1  +  2r  cos  a  + 
[T>(Q!)]mai  =  D{a)\ 


1 

1  — 


(64) 

(65) 

(66) 
(67) 


(68) 


(69^ 

(70) 


Therefore 


and  P(a')]min  =!>(«)  I 


a=sin  1  — 


Hence 


dr 


d<f>k 


<  r  ^ 

“  1  +  2rfc cos(w  +  (-1)3 4)k)  +  rl 


(71) 

(72) 


n  1 


|At|  < 


EEt 

k=lj=Q 


|Arfc|  +  rfc|A<^fc| 


+  2ffc  COs(w  +  {—l)34>k)  +  ^  k^4-i  ^ 


k  A;=n+1 


< 


< 


+ 


Ar 


Ar  +  RA(f)  A  _ -  -k _  .  _ 

l-R'^  ik=i,=o  1 +2f<=cos(w  + {-l)i(^^) +  f|  1-R^  f,£^^^l  +  2f k cos cj  +  fl 


1  —  f2 

^  ^k 


1  —  f  2 
^  ^k 


-( 


Ar  +  fiA(^  VV- 

1  _  R2  2^  1 

^  ^  L*=l  j=Q  3 

Ar  +  RAA  . 

1-^2  J  ^ 


+  2fk  cos(u)  +  (-l)J0ifc)  +rl^  ifez^i  1  cos  w  +  f| 


1-^1 


(73) 

(74) 

(75) 

(76) 


where  i?  =  sup;.  f*.  and  f  =  rjp^p.  Hence  the  difference  in  groupdelay  between  T(z)  and  T(z)  is  small 
provided  that  none  of  the  poles  of  T(z)  are  too  close  to  the  imit  circle  in  the  z-plane,  i.e.,  T(z)  has 
good  stability  margin.  We  also  know  that  the  magnitudes  of  T(z)  and  T(z)  are  close.  Hence  we  can 
conclude  that  ||T(e-j‘^)  -  r(e^’‘^)||oo  is  small. 


Appendix  B 


ID 

do{n) 

di{n) 

IBI 

1.0000 

1.0000 

1 

0.2437 

-0.2437 

2 

Essm 

0.1423 

3 

-0.0956 

4 

-0.0213 

0.0663 

5 

0.0106 

-0.0456 

6 

-0.0042 

0.0303 

7 

0.0005 

-0.0190 

8 

0.0012 

0.0109 

9 

-0.0057 

10 

■ni 

0.0030 

la 

msm 

-0.0022 

Eai 

■UililEI 

" 

Table  B-I 


n 

do{n) 

di(n) 

0 

1.0000 

1.0000 

1 

0.2409 

-0.2408 

2 

-0.0774 

0.1354 

3 

0.0335 

-0.0847 

4 

-0.0134 

0.0523 

5 

0.0032 

-0.0301 

6- 

0.0012 

0.0155 

7 

-0.0016 

-0.0074 

8 

0.0043 

- 

Table  B-H 
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Figure  1;  Two-channel 


Figure  2;  Polyphase  implementation  o 
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Figure  3:  Magnitude  response  of  64D  Johnston  filter 


Figure  4:  Magnitude  response  of  tlie  64D  Johnston  filter  polyphase  components, 
(a)  magnitude  response  of  XgCz)  and  (b)  magnitude  response  of  XjCz). 
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Figures:  Frequency  response  of  the  HR  filter  designed  in  example  1.  before  optimization. 
(  )  Magnitude  response  and  (b)  group  delay  response. 


Ftgure  6:  Frequency  response  of . fie  HR  filter  designed  in  example  I.  after  optimizaUon 
(a)  Magnitude  response  and  (b)  group  delay  respon.se.  ^ 


1.  INTRODUCTION 


The  synthesis  of  digital  filters  using  finite  convolution  was  first  proposed  by  Gold  and 
Jordan  [l].  Finite  convolution  was  also  used  by  Voelcker  and  Hartquist  [2]  who  introduced 
recursive  block  processing  to  exploit  the  computational  advantages  of  the  FFT  and  other 
similar  block  algorithms.  Burrus  and  Parks  [3]  meanwhile  considered  the  time  domain  design 

of  recursive  digital  filters  using  a  matrix  formulation  of  the  problem  to  aid  in  calculating  filter 
structure. 

Block  matrix  filters  (BMFs)  provide  a  state- variable  description  of  block  feedback  on  a 
matrix  implementation  of  convolution  and  were  first  proposed  by  Burrus  [4],  By  investigating 
several  block  recursion  methods,  Burrus  was  also  able  to  demonstrate  increased  computational 
efficiency  in  the  processing  of  digital  signals,  especially  for  filters  of  very  high  order  [5], 

The  relationship  between  block  implementation  of  HR  filters  as  proposed  by  Burrus  and 
the  matrix  formulation  of  HR  filters  via  direct  convolution  methods  as  proposed  by  Gold  and 
Jordan  was  investigated  by  Mitra  and  Gnansekaran  [6]  eventually  leading  to  the  development 
of  new  structures  for  block  implementation  of  HR  digital  filters  [7],  Barnes  and  Shinnaka  [8] 
showed  that  all  irreducible  state-space  realizations  of  the  matrix  filter  can  be  derived  through  a 
procedure  using  a  simple  realization  of  the  required  transfer  function.  Soon  thereafter,  Clark, 
Mitra,  and  Parker  [9]  presented  a  block  adaptive  filtering  procedure  using  a  generalized  LMS 
algorithm  for  calculating  the  filter  coefficients.  In  turn,  Cioffi  [10]  applied  a  deterministic 
time-domain  least-squares  criteria  within  each  of  the  data  blocks  of  the  block-adaptive  filter 
to  exploit  pipelining  of  the  order  recursions. 

Matrix  filters  are  amenable  to  analysis  and  optimization  under  a  variety  of  criteria.  One 
example  is  the  rank  reduction  of  the  least  square  estimator  of  a  Gaussian  process  and  the 
resulting  improvements  in  signal-to-noise  ratios  [11],  Another  is  the  estimation  of  structured 
covariance  block  matrices  from  stationary  time  series  of  multivariate  Gaussian  processes  [12]. 
Filter  matrices  can  also  be  constrained  so  that  system  limitations  are  reflected,  a  topic  briefly 
considered  by  Ahmed  and  Rao  [13].  The  incorporation  of  matrix  structure  constraints  in  the 
design  of  MMSE  block  matrix  filters  was  introduced  by  Corral  and  Lindquist  [14]  [15]  and 
shown  to  produce  computationally  efficient  forms. 

In  light  of  the  computational  power  and  efficiency  of  block  matrix  filters,  it  is  of  general 
interest  to  consider  those  cases  when  the  possible  form  of  the  block  matrix  filter  is  restricted 
due  to  a  priori  constraints.  The  problem  may  therefore  be  summarized  as  follows:— 

To  find  the  properties  and  conditions  for  the  realizability  of  a  block  matrix 
filter  given  the  prescribed  constraints,  and  to  design  an  optimum  block  matrix 
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filter  satisfying  these  constraints. 

It  is  the  purpose  of  this  paper  to  introduce  a  method  for  incorporating  the  practical 
constraints  of  system  implementation  into  the  design  of  SISO  FIR  block  matrix  filters  for  time 
domain  digital  signal  processing.  The  paper  is  organized  as  follows.  Section  II  introduces 
the  basic  definitions  and  nomenclature  used  in  the  paper.  Section  III  provides  a  statement  of 
the  Minimax  Algorithm  in  general  and  then  applies  the  algorithm  to  the  following  matrix 
structures:  time-varying  memoryless  diagonal,  time-invariant  periodic  circulant,  and  time- 
invariant  non-periodic  Toeplitz.  Section  IV  provides  some  graphical  examples  of  the  algorithm 
in  use  with  several  constraints  imposed.  Section  V  provides  simulation  results  for  32  X  32 
matrices  and  various  signal  forms  in  estimation  applications  using  the  additive  noise  model. 
Section  VI  considers  extensions  and  limitations  of  the  proposed  algorithm.  The  proofs  and 
properties  of  the  Minimax  Algorithm  are  relegated  to  the  Appendix. 

2.  BASIC  DEFINITIONS 

From  a  general  viewpoint,  there  are  essentially  two  types  of  constraints  that  define  the 
characteristics  of  a  block  matrix  filter  (BMF),  namely: 

1.  Constraints  due  to  the  pre-defined  rules  governing  the  relationship  between  the  output 

and  the  input  through  the  matrix  implementation  of  convolution.  These  are  constraints 
based  on  the  operations  of  the  filter. 

2.  Constraints  due  to  the  design  requirements  governing  the  characteristics  of  the  output  in 

terms  of  the  input  and  the  filter  matrix.  These  are  constraints  based  on  the  operators  of 
the  filter. 

This  paper  is  concerned  with  item  2.  We  first  begin  with  some  basic  definitions. 

Definition  2.1.  A  time  domain  BMF  system  is  the  implementation  of  the  discrete  convolution 

d  f 

y{n)^  '^h{n,k)x{k)  n  =  0, 1, . . . ,  A  -  1  (2.1) 

it=0 

where  i/(n)  =  y  6  IR^^Ms  the  output  vector,  x{k)  =  x  G  [R^^Ms  the  input  vector,  and 
/i(n,  k)  =  h  E  is  the  filter  matrix. 

Eq.  (2.1)  assumes  that  we  divide  the  input  data  stream  into  data  vectors  of  length  N, 
processing  the  data  vectors  in  an  A  x  A  system,  and  then  reconstructing  the  scalar  output 
stream  from  the  processed  data  vectors.  The  corresponding  single-input  single-output  (SISO) 
system  is  shown  in  Fig.  2.1  [8]. 

Definition  2.2.  The  matrix  structure  constraint  is  the  interrelationship  between  the  elements 
of  the  matrix  h  as 

(2,2) 
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where  p,<7  :  Z  »— ►  Z. 

Table  2.1  shows  the  matrix  structures  we  consider  and  the  corresponding  interrelation  functions 
for  the  indices. 

If  one  of  the  system  requirements  is  that  the  filter  provide  an  output  within  a  certain 
alloted  amount  of  time,  or  is  limited  in  storage,  then  we  have  the  following: 

Definition  2.3  The  speed-memory  constraint  is  the  a  priori  setting  of  certain  matrix  elements 
to  zero  as 

fifc,i  =  0  for  any  A:,/  (2.3) 

For  purposes  of  application,  we  consider  the  causal  Toeplitz  matrix  with  /i  •  =  0  for  7  >  i  to 
be  a  speed-memory  constrained  matrix. 

Let  the  matrix  h  E  be  non-singular  and  such  that  it  minimizes  some  error  in 

y=hx  (2.4) 

Let  the  matrix  h  E  be  subject  to  the  constraints  defined  in  Eq.  (2.2)  and  (2.3)  and  its 

output  be  given  by  the  relation 

y=hx  =  y-t-Sy  (2.5) 

The  difference  between  Eqs.  (2.4)  and  (2.5)  is  the  “bias”  constraint  vector 

Sy  =  y  -  y  =  hx  -  hx  =  {h  -  h)x  (2.6) 

Since  h  is  non-singular,  we  can  write 

x  =  h-^y  (2.7) 

Therefore,  the  bias  vector  is 

Sy=(h-h)h-^y-{hh-^ -I)y  (2.8) 

where  I  is  the  identity  matrix.  VVe  take  the  norm  of  Eq.  (2.8)  as 

||<5y||  =  ||(fi/i-i  - /)yl|  (2.9) 

<  -/||||y||  (2.10) 

where  the  norm  ||  ■  ||  satisfies  the  norm  axioms  [16]  and  in  addition,  satisfies  the  Schwarz 
inequality  [18] 

hp\\  <  Ikllllpll  (2.11) 

We  are  now  led  to  the  following: 


3 


Definition  2.4  The  relative  performance  bias  is  the  measure 


,  M,-  ,  _1  ... 

-/||  (2.12) 

Remark  2.1.  The  requirement  that  h  be  non-singular  assumes  that  in  any  matrix  filter  system 
it  is  possible  to  recover  the  input  from  the  output,  i.e.,  an  inverse  linear  transformation  exists. 

Definition  2.5  A  BMF  is  constrained  if  any  or  all  of  the  constraints  in  Definition  2.2,  2.3,  and 
2.4  are  imposed.  A  BMF  is  realizable  if  it  meets  ail  the  prescribed  constraints.  A  constrained 
BMF  is  optimum  if  it  is  realizable  and  minimizes  the  relative  performance  bias. 

For  purposes  of  demonstrating  the  various  constraints,  we  write  h  to  denote  a  constrained 

BMF  and  h  to  denote  a  “further  constrained”  BMF  (i.e.,  additional  constraints  are  imposed 
relative  to  h). 

Our  problem  therefore  becomes  one  of  determining  the  realizability  of  the  BMF  subject  to 
the  prescribed  constraints,  and  to  find  the  optimum  based  on  the  minimization  of  the  relative 

performance  bias.  Fig.  2.2  shows  the  flow  graph  for  imposing  the  various  constraints  for  any 
given  BMF  system. 

3.  STATEMENT  OF  MINIMAX  ALGORITHM 

Let  F  be  the  set  of  aU  non-singular  matrices  and  W  be  the  set  of  constrained  matrices. 
By  making  the  metric  ^(hh  ^,/)  be  induced  by  the  supremum  norm  we  have  the  relative 
performance  bias  of  Eq.  (2.12),  and  the  equivalent  mathematical  problem  becomes  finding 
g  14/  among  all  possible  k  e  W  such  that 


-  ^lloo  <  - /|U  (.3.1) 

for  ail  6  F  and  where  ||  ■  denotes  the  supremum  norm. 

From  Eq.  (3.1)  we  can  construct  an  equivalent  requirement 

11%  -  ./"Iloo  <  '^1'  (3.2) 

where  ff  =  and  A'  >  0  6  IR.  For  the  pth  row  of  Jig  -  I  we  can  write  the  row  sums  as 


yv-1 

12  -  ^p,g\  <  K 

9=0 


(3.3) 


where  _  (^p.c^p.i,- •  •  ^_j)  is  the  pth  row  of  fi,  •  •  •  ,5yv-i,gr  is  the  gth 

column  of  5,  and  (5^  ^  is  the  Kronecker  delta,  8^  =  1  if  p  =  g  and  0  otherwise. 
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Hence,  let 


(3.4) 


^pSq 


■  P.<7 


N-1  N-l 

Yl  ^p.7  =  1  and  ^^'p,Q  =  (^•^) 

g=0  q=0 

For  any  iteration  m  >  1  we  can  remove  the  inequalities  from  Eq.  (3.4)  and  form  a  set  of 
subconditions  as 

Condition  (Bp)  -/Vp  p(m)  <  h^gj,  -  1  <  +/tp  p(m)  (3.6) 

for  p  =  0, 1, . . . ,  —  1,  and 

Condition  (Cpq)  <  +Kp,j{m)  p^q  (3.7) 

for  p,  9  =  0, 1, . . . ,  ./V  -  1,  where  Kp  ,j{m)  is  the  weighted  bound  at  the  mth  iteration. 

We  note  that  h^g^  -  5p_,  <  +/i'p,,(TO)  and  hpg,  -  ^p,,  >  are  convex  sets  for 

vectors  hp,gg-  Indeed,  the  elements  of  Eq.  (3.4)  are  closed  half-spaces  and  depend  only  on 
the  hyperplane  established  by  the  equality  [17], 

Let  us  denote  the  convex  set  established  by  conditions  (Bp)  and  (Cpq)  by  T.  Since  T  is 
determined  by  the  intersection  of  a  finite  set  of  linear  constraints  (3.6)  and  (3.7),  the  boundary 
of  T  (if  T  is  not  empty)  will  consist  of  sections  of  some  of  the  corresponding  hyperplanes.  T 
will  be  a  region  in  IR^  {D  <  iV^  is  the  relevant  number  of  coefficients  of  h)  and  can  either 
be  empty,  a  bounded  convex  polyhedron,  or  a  convex  polyhedron  which  may  be  unbounded 
in  some  direction  in  general.  If  T  is  empty,  then  no  h  is  realizable;  if  it  is  bounded,  then  the 
convex  polyhedron  establishes  the  region  of  realizability  for  h  and  an  optimum  exists  in  the 
sense  of  Eq.  (3.2);  and  if  T  is  unbounded,  then  no  optimum  can  be  found. 

We  are  therefore  led  to  the  following: 


Realizability  Theorem.  If  the  convex  polyhedron  established  by  conditions  (Bp)  and  (Cpq) 
for  a  given  K  is  not  empty,  then  the  constrained  BMF  h  is  realizable. 


Minimax  Algorithm.  If  a  filter  is  realizable,  perform  the  follow! ng: 

1.  Select  an  objective  function  such  as 

f  obj  ~  EE 

p(')  «^(j) 

where  denotes  the  number  of  unconstrained  variables  for  each  i,j. 

2.  Given  the  bound  K,  set  up  conditions  (Bp)  and  (Cpj). 

3.  With  the  above,  find  the  minimum  and  the  maximum  ^™“^(m)  for  the  mth 

iteration  using  the  simplex  method  of  linear  programming  with  unconstrained  variables 
(cf..  Remark  3.1.). 
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4.  Calculate  the  optimum  as 


A‘'"'(m)=  l(i'"“(m)  +  A'"'"(m))  (3.9) 

5.  Calculate  the  new  bound  as 

K{Tn  + 1)  =  \\h‘’P\m)g  -  I\\^  (3.10) 

6.  Determine  the  following: 

a.  If  the  bound  K{m  +  1)  >  K,  terminate  the  process  and  select  h°P\m  -  1). 

b.  Else,  set  K  =  K{m  +  1)  as  bound  with  weights  =  1  for  all  p,j  and  go  to  2. 

This  is  a  statement  of  the  Realizability  Theorem  and  Minimax  Algorithm  without 
proof.  The  proofs  and  properties  of  the  Realizability  Theorem  and  Minimax  Algorithm 
are  relegated  to  the  Appendix.  However,  there  are  several  immediate  remarks  which  are 
directly  related  to  the  results  and  implications  of  the  Realizability  Theorem  and  the 
Minimax  Algorithm. 

Remark  3.1.  Linear  programming  problems  deal  with  nonnegative  real  variables  in  general. 
Since  h  €  IR  we  need  to  use  unconstrained  variables  of  the  form 

~  Kj  Mj’Mj  -  0  all  ij  (3.11) 


Remark  3.2.  Condition  (Bp)  corresponds  to  N  conditions  in  general  for  p  =  0, 1, . . . ,  A  -  1. 
Condition  (Cp^)  corresponds  to  N{N  -  1)  conditions  in  general  for  p  =  0, 1, . . . ,  iV  -  1  and 
q  =  0,1,...,  A^-l  with  q  ^  p.  This  corresponds  to  a  total  of  A^(A- 1)  + A  =  A"  subconditions 
for  the  system  of  inequahties  of  Eq.  (3.2).  If  we  count  each  inequality  as  a  separate  hyperplane, 
then  we  have  a  total  number  of  2A^  inequahty  conditions  for  the  two  bounding  hyperplanes 
of  each  inequality  enclosing  a  convex  region.  Since  every  admissible  domain  has  only  finitely 


many  extreme  points,  if  the  domain  is  defined  by  r  inequalities  >  0,  y  =  l,...,r,  and  s 


equations,  it  can  have  at  most 


del  (r  +  s)!  . 

—  — —  extreme  points  [18]. 


Remark  3.3.  It  will  be  shown  in  Appendix  Part  I  that  for  the  first  iterative  step  A(l)  can  be 
set  to  be  arbitrarily  large.  IdeaUy,  we  would  hke  to  have  0  <  A'(l)  <  1  imposed  a  priori  in 
Eq.  (3.2)  because  the  trivial  solution  h°  with  =  0  for  all  i,j  gives  -/|U  =  IKII  =  1- 

However,  it  may  be  necessary  to  make  the  convex  polyhedron  large  enough  to  hold  at  least 
one  non-triv"ial  optimum. 


Remark  3.4.  The  requirement  of  Eq.  (3.5)  corresponds  to  a  convex  combination  of  A'(m)  and 
IS  obviously  closed  and  convex  for  any  combination  of  its  elements.  In  order  to  establish  a 
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non-empty  convex  polyhedron  for  conditions  (Bp)  and  (Cp<?),  it  is  recommended  that  >  0. 
Equal  weights  are  easier  to  implement  in  the  simplex  method  [19]  and  we  emphasize  their  use 

here. 

3.1  Application  of  Minimax  Algorithm  to  Structural  Constraints 

The  main  advantages  of  the  proposed  Minimax  Algorithm  are,  it 

1.  addresses  all  the  constraints  in  question,  and 

2.  is  based  on  the  well-known  theory  of  linear  programming. 

These  advantages,  coupled  with  the  straightforward  and  systematic  procedure  for  obtaining 
the  optimum  BMF  once  the  extreme  points  of  the  convex  polyhedron  are  established,  make 
the  Minimax  Algorithm  highly  useful  for  any  constraint  of  interest.  To  illustrate  the  above, 
let  us  consider  diagonal,  circulant,  and  Toeplitz  structural  constraints. 


Diagonal  BMF 

For  h  e  with  h-j  - 

0  for  all  i  ^  j  1  we  can  rewrite  the  conditions  as 

Condition  (Bp) 

1  -  A;,p(m)  <  hp  <  1  +  A'p,p(m) 

(3.12) 

for  p  —  0, 1, . . . ,  -  1,  and 

Condition  (Cp^) 

-A'p,,(m)  <  <  +Ap,,(m) 

(3.13) 

for  p,  9  —  0, 1, . . . ,  —  1,  P  ^  <1 

.  An  objective  function  is 

N-\ 

^obj  - 

(3.14) 

t  =  0 


Circulant  BMF 

For  a  circulant  BMF,  =  hmodAi+no),modsU+no)  we  have  the  new  conditions 

N-l 

Condition  (Bp)  1  -  Kp,p{m)  <  ^  h,i9modsii+p),P  ^  ^ 


for  p  =  0,1,..., A’  -  1,  and 


i-0 


Condition  (Cpq)  ~^^p,qi^)  -  X/  ^o,i9mods{i+p),q  - 


t=0 


for  p,  9  =  0, 1, . . . ,  A  -  1,  p  ^  q-  An  objective  function  is 


N-l 


fob]  -  ^^0,i 


1=0 


(3.16) 


(3.17) 


Toeplitz  BMF 

For  a  Toeplitz  BMF,  hij  =  hi^rto.i+no  ^^d  we  have  the  new  conditions 
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(3.18) 


P  p 

Condition  (Bp)  1  -  K^Jm)  <  ^  oi7.,p  +  ^  ^o,,-pff.,p  <  1  +  A'p,p(m) 

i=0  j=l 

for  p  =  0,  l,...,yv  -  1,  and 

N-p-i  p 

Condition  (Cp,)  -*’,,,(>»)<  ^  +  E  *..09,-,.,  < +'V,(m)  (3.19) 

1  =  0  ,'=1 

for  <7  =  0, 1, . . . ,  -  1,  p  ^  q.  An  objective  function  is 

.  A^-i  yv-1 

^obj  —  ~  (3.20) 

i=0  j=0 

The  conditions  (B;?)  and  (Cpq)  and  the  objective  function  are  used  to  initialize  the 
simplex  tableau  for  a  linear  programming  solution  to  the  problem  using  unconstrained  variables 
as  stated  in  Remark^.!.  Should  different  structures  be  required,  the  formulations  of  (Bp)  and 
(Cpp)  can  be  readily  modified. 


2.2  Application  of  Minimax  Algorithm  to  Speed-Memory  Constraint 

Given  a  block  matrix  filter  must  meet  a  prescribed  speed-memory  constraint  as  in  Def¬ 
inition  2.3,  two  possible  approaches  for  the  incorporation  of  the  constraint  in  the  Mi.n’IMax 
Algorithm  are  possible,  namely:  The  A  Priori  and  A  Posteriori  approaches  (cf..  Fig.  2.2). 
The  main  idea  is  to  set  ^  =  0  for  select  values  of  k  and  /.  If  we  incorporate  the  speed-memory 
constraint  using  the  A  Posteriori  approach,  we  can  relate  the  relative  performance  bias  of  the 
further  constrained  matrix  h  to  the  actual  output  of  the  original  matrix  h  as  follows. 

Once  the  zero  elements  of  h  are  selected,  two  paths  are  possible  when  implementing  the 
A  Posteriori  approach: 

1.  Approximate  the  original  matrix  h  using  the  formulation 


IItIIoo 


<  \\hh-^ 


/|| 


(3.21) 


where  Sy  =  y  -  y  =  hx. 

2.  Approximate  the  constrained  matrix  h  using  the  bounds 


IlMco  <  Mzik  ,  iMk 

IIfIL  llyllco  l|ylL 


where  iy  ~  y  ~ 


<  \\hh  ^ 


-^11 
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(3.22) 


(3.23) 


where  h  is  assumed  to  be  non-singular.  Here,  we  have  used  the  fact  that 

llylL<(i  +  ll^^-‘-/|L)IWL. 

P^llc 


(3.24) 


If  we  want 


loo 


l\y\\c 


<  1,  we  must  consequently  assure  that 


ll<^y|lco  <  1  -/|| 


ll^llcc  1  +  ^hh  '  —  /|| 


loo 

loo 


(3.25) 


Remark  3.5.  In  using  procedure  2,  we  note  that  the  better  the  original  constrained  matrix 
h  approximates  A,  the  better  the  further  constrained  matrix  i  approximates  A.  Eq.  (3.25) 
provides  us  with  a  test  for  the  suitability  of  A  in  approximating  A  in  general. 

4.  EXAMPLES  USING  THE  MINIMAX  ALGORITHM 

°  2  X  2  Circulant  Example  1 

We  consider  the  approximation  of  a  circulant  matrix  to  itself,  namely. 


h  =  i  ~ 


h 


-  (4  3)  9  =  h-^^ 


4/7  -3/7 


(4.1) 


'^0,1  ^0,0 

We  set  the  weights  to  be  equal,  with  =  1/N  =  .5  for  aU  p,q.  We  also  set  A'(l)  =  10  so 
that  A;,,(1)  =  5.  From  Eqs.  (3.15)  and  (3.16),  the  set  of  inequalities  becomes 

rl  a-  3-  4- 

-  7^0,0  +  7^0,1  S  6 
4-  3- 

-5  <  y/jo,o  -  y/io.l  <  5 


4  ^  3 

-5  <  -/jo,o  -  <  5 

A  ^  4- 

^  -  7^0,0  +  7^0,1  <  6 


(4.2) 


From  Eq.  (3.17)  the  objective  function  is 


fobj  —  2/lo,0  +  J 


with  unconstrained  variables 


^0,0  -  (K,o  -  K',o)  f^o.i  =  (h'o,i  -  hi,)  hl„hl„hl,,hli  >  0 
Applying  the  simplex  method  to  the  above  set  of  inequalities  yields  the  extrema 

Min.  point  ;  =  -32,  ^  =  -31 

Max.  point  :  =  38,  h,,  ,  =  39 


(4.3) 


(4.4) 
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Eq.  (3.9)  gives  the  solution 


^0,0  — 


38  -  32 


^0,1  ~ 


39  -  31 


=  4 


(4.5) 


This  is  the  original  matrix.  Figure  4.1  shows  the  convex  polytope  for  this  example. 
°  2  X  2  Toeplitz  Example  2 
°  °  A  Priori  approach 

We  consider  the  following  matrices  but  let  us  a  priori  set  hg  i  =  0,  requiring 


g  -  h  ^ 


(  1/9 
V-1/18 


Initially,  let  Ap  ^  =  .5  for  all  p,q,  and  let  A'(l)  =  10  so  that  A'p,,(l)  = 
and  (3.19),  the  set  of  inequalities  become 


5.  From  Eqs.  (3.18) 


^  -  9^0,0  ^  ^ 

2- 

—  9 ‘^0,0  —  ^ 

-3  <  g^i,o  “  Yg^o.o  ^  3 

...  2=  5  f 

“4  <  ghj  o  +  ^hgg  <  6 


Eq.  (3.20)  gives  the  objective  function 


(4.7) 


fobj  ~  2^0,0  T  ^1,0 


(4.8) 


Applying  the  Minimax  Algorithm  gives  the  following  matrix  in  the  first  iteration 

^^^^=(4^  0)  (Ts  0)  (4.9) 

so  that  -  /||^  =  1.  We  cannot  improve  this  answer  because  the  selection  of  j  =  0 

has  made  the  trivial  solution  h  =  0  the  optimum  solution. 

°  Posteriori  approach 

In  the  A  Posteriori  approach,  we  first  calculate  the  full  approximating  matrix  h  and 
then  calculate  the  further  constrained  matrix  h  based  either  on  the  constrained  matrix  Ji  or 
the  unconstrained  matrix  h.  Applying  the  Minimax  Algorithm  we  find  that  the  optimum 
constrained  matrix  is 

*=(l.3^5  “5®)  (“10) 
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One  possible  heuristic  for  selecting  the  proper  value  ,  to  set  to  zero  is  to  select  the  value 
closest  to  zero  in  magnitude.  Consequently,  we  would  make  Aj  g  =  0.  We  also  note  that  if  we 
wanted  to  approximate  to  h  (or  h),  we  must  satisfy  Eq.  (3.25),  that  is, 


ilM  <  ^  ~ 

l|y||  -  1  +  .125 


.7778 


Let  us  first  attempt  to  approximate  to  h,  requiring 


-(Jrs  -/) 


/  .1389  .2222  \ 

V-.0382  .1389; 


(4.11) 


■^’^’(1)  =  10  so  that  A'p  ,(l)  =  5.  From  Eqs.  (3.18)  and  (3.19), 
the  set  of  inequalities  now  becomes 


-4  <  .1389^0  0  -  .0382^0  j  <  6 

-5  <  .2222^0  0  +  .1389^0  j  <  5 

:  ’  (4.12) 

-5  <  -.0382/1(5  0  <  5 

-4  <  .1389^0,0  <  6 

Eq.  (3.20)  gives  the  objective  function 


fobj  =  ^K,o  +  K,i  (4.13) 

Applying  the  Minimax  Algorithm  gives  the  following  matrix  in  the  first  iteration 


A(l)  = 


0 


with  h(l)h  ^  -  I  = 


0 


0 


-.1910  -.3055 


(4.14) 


so  that  ||A(l)/i  ^  -  /||  <  .4965.  We  can  improve  the  answer  by  additional  iterations,  obtaining 
the  optimum  as 

^  ; 5.934  -9.778^ 


0 


5.934  J 


(4.15) 


r  opt 


with  the  relative  performance  bias  ||/i  h~^  -  /||^  <  .5055. 

Alternatively,  we  can  approximate  to  the  original  unconstrained  h,  requiring 


^  -  (  ^0.0  ^0,1 
\  0  Aq  Q 


=  f5  ■®i  a  =  h-'=(  2/9', 

-1/18  5/36  J 


2  4 


(4.16) 


Applying  the  Minimax  Algorithm  gives  the  following  optimum  Toeplitz  matrix 

=  _  (  6.0466  -9.67456  "N 

“V  0  6.0466  ) 


(4.17) 
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r  opt 

so  that  \\h  h  '  -  /||,_^  <  .4961.  This  is  very  close  to  the  result  of  Eq.  (4.15). 

5.  SIMULATION  RESULTS  USING  MINIMAX  ALGORITHM 

In  communications  systems  data  is  often  transmitted  in  some  digital  modulation  format 
such  as  BPSK,  QPSK,  QAM,  etc.,  with  a  sinusoidal  carrier.  Determining  the  presence  of  the 
proper  format  via  estimation,  detection,  or  correlation  techniques  is  critical  to  minimize  Bit 
Error  Rates  (BER).  In  addition  to  typical  sinusoidal  waveforms,  it  is  of  general  interest  to 
investigate  the  performance  of  any  proposed  method  with  signals  exhibiting  sharp  transitions. 
Consequently,  we  provide  simulations  for  both  sinusoidal  and  non-sinusoidal  waveforms  in  the 
difficult  application  of  signal  estimation  in  noisy  environment. 

For  purposes  of  simulation,  we  consider  estimation  application  with  non-singular  matrices 
h  such  that  hx  =  d,  that  is,  the  matrix  being  approximated  provides  the  desired  output  from 
the  input.  For  the  constrained  approximating  matrix  h  we  considered  the  following  forms; 

a.  Time-varying  memoryless  diagonal. 

b.  Time-invariant  periodic  circulant. 

c.  Time-invariant  non-periodic  Toeplitz. 

d.  Time-invariant  non-periodic  cuasal  Toeplitz. 

The  simulations  were  implemented  in  FORTRAN  on  a  VAX  4000-600.  The  signals  simu¬ 
lated  had  TV  =  32  samples  with  period  T  =  1.  The  additive  white  noise  model  was  used  with 
a  signal- to-noise  ratio  of  lOdB.  One  group  of  signals  has  two  periodic  signals— the  sine  and 
cosine  waveform  with  period=.19635  for  one  full  cycle  in  the  32-sample  window.  The  other 
group  has  two  non-periodic  signals— the  ramp  with  an  increment=.0.312.5  for  each  sample  and 
the  exponential  with  a  damping  factor=.125.  Plots  of  the  matrices  h,  actual  input  and  desired 
output  for  the  corresponding  matrices  are  given  in  Figures  5. 1-5.4. 

In  addition  to  the  termination  requirements  of  the  Mi.ximax  Algorithm,  we  further 

imposed  the  following  scheme:  If  the  error  was  being  reduced  but  by  a  value  less  than  a 
threshold  r,  that  is,  if 

-  /||^  -  ^  ^ 

the  procedure  was  terminated  and  the  optimum  solution  for  the  previous  iteration  was  used. 

For  our  simulations,  an  original  value  of  r  =  .05  was  selected.  However,  it  was  often  nec¬ 
essary  to  modify  the  above  procedure  to  prematurely  terminate  the  iterative  process  allowing 
for  non-trivial  solutions,  even  if  these  violate  the  relative  performance  bias  bound  of  unity. 
This  IS  due  to  the  fact  that  for  large  matrices,  the  optimality  condition  is  hard  to  obtain  (see 
Section  VI).  The  IMSL  routine  DLPRS  [20]  was  used  for  the  Minimax  Algorithm. 
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Fig.  5.5  shows  the  plots  of  the  optimum  constrained  diagonal  matrix,  the  actual  output, 
and  the  relative  performance  bias  bound  for  each  signal  type.  For  the  diagonal  matrix,  the 
output  contains  more  noise  than  the  input,  so  performance  is  inferior. 

Fig.  5.6  shows  the  plots  of  the  optimum  constrained  circulant  matrix,  the  actual  output, 
and  the  relative  performance  bias  bound  for  each  signal  type.  The  circulant  matrices  perform 
well,  especiaUy  for  periodic  signals  as  expected.  The  actual  relative  performance  bias  shows 
that  the  Minimax  Algorithm  is  able  to  extract  the  essential  features  of  the  input  from  the 
matrix  being  approximated. 

Fig.  5.7  shows  the  plots  of  the  optimum  constrained  Toepltiz  matrix,  the  actual  output, 
and  the  relative  performance  bias  bound  for  each  signal  type.  The  Toeplitz  matrix  filters 
perform  well  in  general,  but  the  remarkable  feature  of  this  simulation  is  that  for  periodic 
signals,  the  best  Toeplitz  structure  is  the  circulant  structure.  This  is  due  to  the  fact  that 
the  Minimax  Algorithm  is  approximating  the  same  type  of  general  matrix  for  the  periodic 
signals,  and  that  the  circulant  structure  is  a  special  case  of  the  Toeplitz  structure. 

Fig.  5.8  shows  the  plots  of  the  optimum  constrained  causal  Toeplitz  matrix,  the  actual 
output,  and  the  relative  performance  bias  bound  for  each  signal  type.  The  causal  Toeplitz 
matrix  can  be  viewed  as  a  Toeplitz  matrix  with  a  speed-memory  constraint  imposed  such  that 
j  =  0  for  j  >  i.  Although  the  performance  is  inferior  to  the  circulant  and  Toeplitz  filters,  it 
is  better  than  the  diagonal  filters  while  still  only  storing  N  elements. 

6.  EXTENSIONS  AND  LIMITATIONS  OF  MINIMAX  ALGORTIHM 

If  the  Minimax  Algorithm  is  adjusted  for  minimizing  the  column  sum  the  result  is  a 
minimization  of  the  corresponding  1-norm  of  the  output  bias.  Consequently,  the  Realizabil¬ 
ity  Theorem  and  Minimax  Algorithm  address  both  norms  through  a  simple  change  in 
formulation. 

A  cursory  analysis  of  the  Minimax  Algorithm  would  reveal  that  the  main  problem  is 
addressed  and  solved  through  an  iterative  procedure  that  is  computationally  intensive:  There 
are  A"  subconditions  in  general,  and  there  are  as  many  as  2N  variables  for  the  matrices  we  are 
considering.  However,  we  can  note  that  N- - conditions  are  redundant.  As  a  result 
of  applying  the  simplex  method,  these  redundant  conditions  are  never  considered,  thereby 
reducing  the  actual  number  of  computations.  For  the  speed-memory  constraint,  additional 
variables  are  eliminated,  further  reducing  the  computational  load. 

If  we  increase  A,  however,  for  the  majority  of  practical  situations  wiU  yield  K{m)  >  1 
for  any  m.  The  reasons  are  outlined  below: 

1.  The  matrix  h  is  not  well-suited  to  approximate  h.  This  is  especially  true  for  speed- 
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memory  constrained  matrices. 

2.  In  order  to  satisfy  \\hh~^  -  /||^  <  1,  each  subcondition  must  satisfy 

N-l 

1  (6-1) 

9=0 

for  each  row  p.  As  N  increases,  it  becomes  more  difficult  to  satisfy  Eq.  (6.1). 

Given  the  fact  that  for  large  N  the  result  may  be  the  trivial  solution  it  may  not 
be  necessary  to  terminate  the  process.  In  the  Appendix  it  is  shown  that  as  long  as  certain 
conditions  are  satisfied,  the  Minimax  Algorithm  reduces  the  bound  R\  While  an  optimum 
solution  with  bound  K  >  1  may  not  be  desirable  in  absolute  terms,  it  may  be  “tolerable”  for 
the  application  at  hand. 

The  supremum  norm  is  the  maximum  of  the  row  sum  of  the  relative  performance  bias.  It 
is  possible  that  one  row,  say  row  A:,  gives  the  condition  K  >  1  but  all  other  rows  i  —  1,2,. . ., 
i  /  k  satisfy  <  1.  The  resulting  output  will  therefore  have  its  maximum  error  at  y{k). 
By  additional  post-processing  it  may  be  possible  to  reduce  the  error  at  y{k)  and  thereby  stiU 
satisfy  the  realizability  conditions.  This  is  also  applicable  to  the  reduction  of  the  maximum 
of  the  column  sums  of  the  relative  performance  bias  for  the  1-norm  problem. 

In  the  simulations  of  Section  V  we  calculated  the  optimum  h  even  under  the  condition 
that  A^  >  1.  This  is  to  demonstrate  the  method  while  still  keeping  in  mind  that  the  trivial 
solution  may  be  necessary  in  order  to  insure  K  —  1.  If  K  >  1,  then  the  trivial  solution 
may  be  the  only  solution. 

Although  the  MiNiMAX  Algorithm  finds  the  solution  via  the  relative  performance  bias 
bound,  it  can  be  extended  to  the  more  traditional  methods  of  minimizing  the  error  of  the 
output  vector  from  the  desired  output  vector  without  any  loss  of  generality.  Consider  the 
minimization  of  the  error 

e^y— d~  fix  ~d  (6.2) 

where  y  is  the  actual  output  vector  and  d  is  the  desired  output  vector.  There  exists  at  least  one 
h  G  such  that  hx  =  d.  Furthermore,  if  h  is  non-singular,  then  x  =  h~^d.  Therefore, 

the  error  can  be  written  as 


e  —  hx  —  hx  =  {h  ~  h)x  =  (^  —  h)h  ^d  (6.3) 

Taking  the  supremum  norm  of  both  sides  and  simplifying,  we  get  the  main  result 

lklloo<ll^/^-'-/|LNIL  (6-4) 

We  can  minimize  ||€||^  by  minimizing  the  relative  performance  bias. 
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7.  CONCLUSION 


We  have  introduced  the  Realizability  Theorem  for  testing  the  realizability  of  a  BMF 
subject  to  the  matrix  structure,  speed-memory,  and  relative  performance  bias  constraint.  The 
Minimax  Algorithm  uses  the  Realizability  Theorem  in  an  iterative  procedure  to  find 
the  optimum  BMF  subject  to  the  constraints. 

The  Minimax  Algorithm  is  based  on  the  simplex  method  of  linear  programming  applied 
to  a  system  of  linear  constraints  obtained  from  the  supremum  norm  of  the  relative  performance 
bias.  The  optimum  at  each  iterative  step  is  the  midway  point  between  the  minimum  and 
the  maximum  points  of  the  convex  polytope  established  by  the  linear  constraints.  For  each 
iterative  step,  a  new  bound  is  calculated  from  the  optimum  and  used  for  the  next  iteration.  It 
has  been  shown  that  this  iterative  technique  always  reduces  the  bounds  as  long  as  the  convex 
polytope  established  by  the  linear  constraints  is  not  empty. 

The  Minimax  Algorithm  was  shown  to  be  extendable  to  a  variety  of  error  parameters 
under  a  generalized  approach.  In  addition,  it  was  shown  that  the  application  of  the  above 
constraints  were  straightforward  without  any  loss  of  generality.  Examples  and  simulation 
results  show  that  the  Minim  AX  Algorithm  can  be  applied  to  the  design  of  optimum  block 
matrix  filters  subject  to  the  prescribed  system  constraints. 

APPENDIX:  PROOFS  AND  PROPERTIES 

The  Appendix  is  concerned  with  the  proofs  and  properties  of  the  Realizability  Theo¬ 
rem  and  Minimax  Algorithm  proposed  in  Section  III.  The  Appendix  is  organized  as  follows. 
Part  I  provides  the  proofs  and  properties  of  the  Realizability  Theorem.  Part  II  provides  a 
detailed  analysis  of  the  constituent  parts  of  the  Minimax  Algorithm  while  Part  III  provides 
the  convergence  proof  of  the  algorithm. 

Part  1.  REALIZABILITY  THEOREM 

The  Realizability  Theorem  establishes  the  existence  of  an  optimum,  or  equivalently, 
a  region  of  realizability.  In  order  for  an  optimum  to  exist,  the  convex  polyhedron  T  established 
by  Conditions  (Bp)  and  (Cpq)  must  be  bounded  in  all  directions.  We  must  consequently  have 

Theorem  1.1.  For  any  finite  A  >  0,  conditions  (Bp)  and  {Cpq)  establish  a  convex  polyhedron 
that  is  either  empty  or  bounded  in  all  directions. 

Proof.  The  condition  for  empty  is  trivial,  so  let  us  prove  the  bounded  case.  The  matrix 
Q  =  h~^  used  to  establish  conditions  (Bp)  and  (Cp^)  is  the  inverse  of  the  matrix  h  that  is 
being  approximated.  Therefore,  the  columns  of  g  are  linearly  independent  and  none  of  the 
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intersecting  pair  of  bounding  hyperplanes  established  by  ,  are  parallel  to  each  other 

for  any  p  or  q.  Therefore,  they  all  intersect  and  bound  the  resulting  convex  polyhedron.  □ 

The  domain  of  realizability  is  a  bounded  convex  polyhedron  {convex  polytope)  established 
by  the  prescribed  constraints.  We  will  now  show  that  if  an  optimum  exists  for  a  nonempty 
convex  polytope  constructed  from  a  given  set  of  weights  and  an  initial  bound,  the  optimum 
exists  for  any  bound  K'  >  K. 

Theorem  1.2.  Given  a  set  of  weights  such  that  ^A'  and  the  optimum  exists 

for  this  value  of  K,  then  the  optimum  exists  for  any  K'  >  A'. 

Proof.  For  any  p  and  q,  the  bounds  are 


-  ^pSq  ^p,q  <  or  0  <  ^  +  A'p^^  <  2A'p  (1.1) 

Now  let  K'  =  aK  where  1  <  q  <  oo.  Also,  A'^  ^  Substituting  into  Eq.  (1.1)  we  get 

0  ^  V,  -  <^P.,  +  ^  (1.2a) 

0  <  Ksq  -  ^p,q  +  g  <  2qA'p  ,  (1.26) 

Dividing  Eq.  (1.26)  through  by  a  we  obtain 

»  s  1(^8, + a;,,  <  2A-„  (1,3) 

From  Eq.  (1.1)  we  get  the  main  result 


0  <  -A'p,,  +  A'„  „  <  2K 


a 


v,i  — 


P.7 


(1.4) 


or 


which  is  true  for  1  <  a  <  oo.  ^ 

Remark  1.1.  Theorem  1.2  basically  states  that  our  ability  to  find  an  optimum  is  not  sensitive 
to  the  initial  bound  K.  (We  will  show  this  in  more  detail  in  the  development  of  the  actual 
method  in  Section  IV.)  A  can  be  set  arbitrarily  large  in  order  to  assure  we  enclose  at  least 
one  non-trivial  optimum  (cf.,  Remark  2.2,  in  Section  III). 

A  useful  result  is  the  following: 


Theorem  1.3.  If\\h°'P^g  -  /||^  =  K'  for  some  K  >  A"',  then  the  optimum  exists  for  any  set 
of  weights  Ap  ^  such  that  A^p  ^  =  ^p,q^'  >  all  p,q  and  the  given  K. 
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Proof.  By  Theorem  1.2,  we  can  select  the  initial  bound  K  arbitrarily  large.  Since  T  is 
non-empty  for  any  K  >  W^P^g  -  /|i^,  we  can  make  K  such  that  K  K'  and  T  is  still  not 
empty.  Moreover,  we  can  make  K'  for  any  p,  q.  The  set  of  weights  satisfying  this 

relation,  i.e.,  Ap  ^  >  K' /K,  assure  T  is  not  empty  and  consequently  the  optimum  exists.  □ 

Remark  1.2.  The  optimum  does  not  exist  for  any  set  of  weights  in  general,  only  those  specified 
by  Theorem  1.3,  namely,  the  weights  that  guarantee  that  the  convex  polytope  T  is  not  empty. 
Since  any  of  the  above  set  of  weights  can  be  used,  it  is  obvious  that  uniform  weights  Ap  ^  (i.e., 
=  ^T,s  >  K' ! K  for  all  p,q,r,s)  can  be  used  with  equal  effectiveness. 

The  optimum  block  matrix  filter  therefore  exists  and  can  be  determined  from  the  convex 
polytope  T.  However,  the  optimum  is  not  unique  in  general  because  of  the  supremum  norm's 
max  operator.  Additional  constraints  need  to  be  imposed  in  order  to  find  a  global  optimum. 
We  will  now  show  that  the  MiNiMAX  Algorithm  finds  the  optimum  through  the  iterative 
procedure  described  in  Section  III.  We  first  begin  with  a  brief  analysis  of  the  constituent  parts 
of  the  Minimax  Algorithm. 

Part  II.  THE  OBJECTIVE  FUNCTION 

In  linear  programming  problems,  the  objective  function  is  either  minimized  or  max¬ 
imized  based  on  the  linear  constraints  given.  For  the  Minimax  Algorithm,  the  objective 
function  must  have  two  basic  properties: 

1.  The  objective  function  must  be  a  linear  function  of  all  the  variables  of  h  in  question.  This 
guarantees  that  all  the  extreme  points  of  the  convex  polytope  T  will  be  considered. 

2.  The  objective  function’s  formulation  must  result  in  a  hyperplane  that  is  not  parallel  to 
any  of  the  bounding  hyperplanes  that  constitute  the  convex  poly  tope  T. 

The  objective  function  is  swept  through  the  convex  polytope  T  to  obtain  either  the  minimum 
or  maximum  extreme  point. 

The  Minimax  Algorithm  addresses  the  realizability  of  optimum  block  matrix  filters 
with  real  elements.  Consequently,  the  extreme  points  of  T  obtained  by  can  be  positive 
or  negative  (this  is  the  reason  we  need  unconstrained  variables).  An  objective  function  of  the 
form 

fobj  ~  ~  /^O, 0^0,0  "I"  /^O, 1^0,1  +  ■  ■  ■  +  +  •  •  •  Pi^  i  >  0  (2.1) 

where  h  =  (ho.oi  ^o,n  •  •  •  >  ^i,o>  •  •  •)"*'  "'ith  hi^j  =  (hj. ,  —  denotes  the  vector  of  relevant 
variables  of  the  matrix  h  and  /3  =  (/3o,o./?o,i^  •  •  •  •  •  •)  is  the  coefficient  vector,  assures 

that  if  there  are  negative  (positive)  extreme  vertices  in  T,  these  will  result  in  a  minimum 
(maximum)  solution  for 
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Although  Eq.  (2.1)  satisfies  property  1,  we  must  select  the  proper  ^  >  Q  to  satisfy 
property  2.  It  is  eaay  to  see  that  if  we  make  the  set  {(3}  countable  by  letting’/?,  ,  g  N,  then  it 
IS  a  countable  subset  of  IR  and  hence  is  of  measure  zero  [21],  This  assures  property  2,  that  is 
for  the  set  of  all  possible  hyperplanes  V,  -  <5^  ^  with  c,,  h  6  /?h  is  of  measure  zero.  ’ 

The  authors  selected  an  objective  function  in  which  the  coefficients  /?,  ,  were  set  to  be 
equal  to  the  number  of  elements  for  the  variable  that  is,  in  proper  form! 

-  h(i),<j(j){N)  all  2,J  (•2  2) 

This  corresponds  to  an  objective  function  in  which  the  variables  are  weighted  by  the  number 
of  times  they  appear  in  the  matrix  h  (e.g.,  Eqs.  (3.14),  (3.17),  and  (3.20)  in  Section  III). 
Since  we  are  working  with  unconstrained  variables,  it  is  easy  to  show  that 

^^obj  _  ^fobj 


dh",., 


that  is,  the  derivatives  are  equal  but  in  opposite  directions.  With  the  weighting  coefficients 

ft,,  we  force  the  largest  positive  and  negative  elements  of  the  given  convex  polytope  via  the 

prescribed  objective  function. 

The  weights  are  selected  such  that  if  we  form  the  ratio  and  sum  the  resultant 

coefficients  for  each  matrix  structure,  we  get  1.  For  the  diagonal  matrices,  each  coefficient  is 
a  uniform  weight  as  -h,„  .'  =  0, 1, . . . ,  IV  -  1  and  since  there  are  A-  variables,  the  result  is  1, 
For  circulant  matrices,  each  coefficient  is  a  uniform  weight  as  i  =  0,1 . N  -I  and 

- i-1 _  •  n-r  ---  jV  ’  ’ 


since  there  are  again  N  variables,  the  sum  is  also  1. 

Toeplitz  matrices  have  the  coefficients  weighted  as  ^  ~  '^h 

°  J\^2  0 

0, 1, . . .  ,JV  —  1.  The  sum  of  the  coefficients  also  result  in  1; 


j  Ir 

^i,o  for  i 


where  —  is  from  the  element  and  2 

each  element  h^  -  and  h-  g.  From  Eq.  (2.4)  we  have 


IS  a  result  of  summing  the  coefficients  for 


JV  -  i 


12%^^  12 


N  iV2 

JL  2 

N  ^  7^ 


-  N 

N  /V2 


N(N  -  1)  -  -L  j-  A  ~  2^^  ~  +  N' 

L  2  J  /V  iV2  [  - 
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Part  III.  THE  MINIMAX  ALGORITHM 

The  procedure  for  determining  the  realizability  of  a  constrained  block  matrix  filter  re¬ 
quires  constructing  a  convex  polyhedron  T  bounded  by  A"(l)  >  0.  By  Theorem  1.1,  T  is  a 
convex  polytope.  By  Theorem  1.2,  /t'(l)  can  be  chosen  arbitrarily  large.  Since  we  have  an 
intersection  of  hyperplanes  we  note  that  each  initial  subcondition  can  be  represented  by  the 
relation 


where 


r  7/t  lii- 

K  (Us,  -  i,.,  =  +A-,,,(1)  -  7?,“(U 

(3.1a) 

-  min ,  ^ 

K  (Us,-a,  =  -aWU  +  7,";"(U 

(3.16) 

0  <  7"'"(U.7“"'(U  S  2A',,,(1) 

(3.2) 

constitute  the  offsets  resulting  from  the  actual  intersection  of  the  hyperplanes.  If  we  let  the 
initial  optimum  be  calculated  as  a  convex  combination  of  the  extreme  points  as 


(3.3) 


^  1  ^  \  C*—  Irmtn.  . 

h  (2)=-h  (1  -P - h  (1)  C>1 

c  c 

then  we  can  show  the  following 

Lemma  3.1.  If0<  7;:‘"(1),7;:“"(1)  <  A'p_/1)  then 

-  «^p,,  <  and  h°/\2)g^  -  >  A"''"(l)g,  -  6^^^  (3.4) 

and  if  Ap_,(l)  <  7p”:f  (l).7p”:r(l)  <  2Ap^,(l)  then 

^  TTXCL^  / 

^p  (2)g, (1)^?-^P.?  and  <  Ar'"(l)^?-^P,,  (3-5) 


Remark  3.1.  Lemma  3.1  states  that  if  we  take  the  convex  combination  of  the  two  extreme 
vertices  of  T,  we  are  never  outside  the  convex  poly  tope,  and  furthermore,  we  improve  the 

answer.  This  result  is  trivial  at  first,  but  its  proof  details  the  iterative  procedure  and  leads  to 
establishing  a  value  for  c. 

Proof.  The  optimum  for  any  subcondition  is 


-  ^p.,  = 


1  ~  max  C  —  1  7  min  ,  , 

-k,  (1)+—*,  (1) 


P,9 


1  -max 

^  -K 


/ 1  \  ^  “  1  r  Tntn  .  ^ 

(i)s,  +  — ■hp  (i)g,  - 


C  r  c 

in  general.  Substituting  Eqs.  (3.1)  into  Eq.  (3.6)  and  rearranging  we  get 

=  ^^A^p.,(l)  -  ^7pT(1)  +  ^7pT(l) 


(3.6) 


(3.7) 
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If  0  <  we  want  to  show  that 


-opt 


or,  from  Eq.  (3.7)  and  Eq.  (3.1a), 


-  ^7pT(i) + ^7p”:r(i)  <  A'p,,(i)  -  7p":r(i) 

Assume  the  opposite  is  true,  then  multiplying  Eq.  (3.8)  by  c  and  rearranging  we  get 


or 


(3.8) 

(3.9a) 
(3.96) 

(3.9c) 

But  this  is  a  contradiction  since  0  <  7p”f  (l)’Tp”:r(l)  <  ^^,,(1).  so  we  have  proven  Eq.  (3.8). 

The  same  argument  can  be  Mowed  for  the  other  conditions  by  substituting  the  appropriate 
assumptions  and  inequalities  in  Eqs.  (3.9). 

Lemma  3.1  can  be  extended  to  any  iteration  m  for  the  nonempty  convex  polytope  T. 

From  Eq.  (3.8),  we  see  that  if  we  set  c  =  2,  we  eliminate  K^Jm)  from  each  iteration  subcon¬ 
dition,  that  is,  for  any  iteration  m,  and  letting 


(c  -  l)7p”:r(l)  +  (c  -  l)7p”:‘"(l)  >  (2c  -  2)A;_^(1) 

(c  -  1)  [7pT(1)  +  7pT(1)]  >  (c  -  1)2A;.,(1) 

7p”:ni)  +  7pT(l)>2A;,,(l) 


then 


h  (w  +  1)  =  -  ^h”"“'^(m) -f- h’”'"(m)j 


1  opt  1 

\  (">  +  1)«,  -  i>ny  p,q 


(3.10) 


2V'p,9  V'V  Ip,g  (3.11) 

If  any  7^  (m)  =  7^“^(m),  {m  d-  l)g^  -  ^  is  identically  zero,  which  corresponds  to 

collapsing  the  convex  polyhedron  for  that  particular  p  and  q  to  a  single  hyperplane.  At  the 
next  iteration,  the  convex  polytope  is  a  fortiori  empty. 

We  further  note  that  there  are  other  cases  for  which  b"^\rn)g^  -  6^^^  is  reduced  to  a 
single  hyperplane.  Consider  the  ca^e  when  7u"(m)  =  0  and  =  2A;  ^(m)  for  some 

index  k,l  and  iteration  m,  then  Eqs.  (3.1)  become 


-max 

“  ^k,i  =  =:  -K^ 

-'min 

{m)gi  -  =  -K^  fm)  +  0  =  -A',^  ,(m) 


m 


(3.12a) 

(3.126) 


which  also  corresponds  to  a  single  hyperplane.  Furthermore,  hf. 
to  gi,  or  d  -  1  otherwise.  It  can  also  be  shown  that  we  have  a  single  hyperplane  if  7”/"(m)  = 


7  max  -r  7^*^ 

an/.  li  hf.  is  normal 
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=  0.  Therefore,  for  any  nonempty  convex  polytope  T,  we  must 

necessarily  have 

0  <  7m "(w),  7^r(m)  <  2A'p„(m)  (3.13) 

Lemma  3.2.  If  the  convex  polytope  T  is  not  empty  for  the  given  m,  then 

^'‘^p,q(^)  <  ^p,q(^  -  1)  Tn>2  (3.14) 

where  Kp  g{m)  =  maXp_,  \h°^\rn)g  -  I\. 

Proof.  The  optimum  at  iteration  m  +  1  yields  the  result 

hT(^  +  -  ^P,q  =  li^fp^i^)  -  (3-10) 

The  next  bound  is  selected  as 


+  1)  =  max +  l)g  -  /| 
and  is  some  positive  value,  necessarily  of  the  form 

+  1)  =  -  7p”rL('")l  P.«  (3-15) 

where  the  subscripts  denote  the  indices  for  which  max„  „  \h°PUm  +  l)o  -  71  > 

\h°^^{m  +  1)^  —  I\  for  all  p,q  at  the  mth  iteration. 

Reinitializing  the  bounds  we  have 

~  max  . 

hp  (m  +  l)g,  -  <!ip = +Ap_,(m  +  1)  -  7;;“^(m  +  1)  (3.16a) 

rmtn,  - 

hp  (m  +  l)g^-<5p„  = -Ap^(m  +  l)  +  7;;^”(m  +  l)  (3.166) 


and  0  <  7^^"(w  +  l)i7^“^(’7r  +  1)  <  2K^  ^{m  +  1)  and  Kp^gim  +  1)  is  given  by  Eq.  (3.11). 
In  the  next  iterative  step  we  have 

h^im  +  2)g,  -  +  1)  -  +  1)) 

and  consequently 


^^'p,qi^  +  2)  = 


'  pm  -f  1  »9m  H 


+.(^  +  l)-7pT".,,„+:(^+l)  (3-17) 


The  argument  can  be  extended  for  +  z),  z  =  1,2, . . .,  as  long  as  the  convex  polytope  is 

not  empty. 
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Theorem  3.3.  For  any  indices  p,q  and  iteration  m,  ifh°’''(m)s  -  S 

^  P  \  J&q 

subcondition  to  a  single  hyperplane,  then 


does  not  reduce  B.ny 


- /|U  <  ||A»‘(m  -  l)s  - /||^ 


(3.18) 


Proof.  FoUows  ftom  Lemma  3.2.  Siooe  T  is  non-empty,  „e  are  guaranteed  that  we  reduce  the 
bounds  for  each  subcondition.  Since  ||A»r.(„  _  ^  _ 

--  I*;  (»).,  -  ^r.,1  <  li-rV  -  1).,  -  for  ail  p„,  then  it  is  ais:  true  for  the  sis  ri 
Eq.  (3.18)  is  obviously  true. 

□ 


Ap  ^(m)  IS  a  monotonicaJly  nonincreasing  function.  The  iterative  procedure  continues 
until  we  collapse  one  or  more  of  the  bounding  convex  polyhedra  to  single  hyperplanes  or  we 
force  the  convex  polytope  to  be  empty.  When  the  bounding  convex  polyhedra  are  reduced  to 
single  hyperplanes,  we  proceed  with  one  more  iteration  step.  This  c«n  result  in  an  improved 
solution  although  the  convex  polytope  is  now  empty  and  consequently,  no  more  iterations  are 
possible  as  there  are  no  feasible  solutions  for  the  simplex  method. 

It  is  at  this  juncture  where  the  M.NiMAX  Algorithm  departs  from  the  usual  procedure 
for  showing  the  convergence  of  a  sequence  of  iterations  since  the  strongest  result  we  can 
obtain  ,s  Theorem  3.3,  namely,  that  A',  ,(m)  is  a  monotonically  nonincreasing  function  for 
nonempty  T.  Because  in  the  last  iteration  step  the  answer  can  be  improved  although  the 
convex  polytope  is  empty,  it  is  not  possible  to  show  that  for  every  case 


=  (3.19) 

This  ambiguity  is  not  inconsistent  with  the  procedure  as  at  each  iterative  step  the  same 
process  takes  place.  It  is  the  purpose  of  the  Minimax  Algorithm  to  collapse  the  bounding 
convex  polyhedra  to  single  hyperplanes. 
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FIGURES 


Table  2.1  Matrix  Structure  and  Interrelation  Functions 


Matrix  Structure 

Interrelation  Function 

Diagonal 

p{i)  =  i,  a{j)  =  j 

Circulant 

p{i)  =  modj^{i  +  no),  a(j)  =  modj^{j  +  Uq) 

Toeplitz 

p(t)  =  i  +  no,  (7(j)  =  j  +  no 

Figure  2.2  Flow  graph  for  imposing  constraints  on  system. 


Figure  4.1.  Convex  polytope  for  2  x  2  Circulant  Example  1.  Unshaded  area  is  the 
region  of  feasible  solutions.  The  minimum,  maximum,  and  midpoint  solution  are 
shown* 
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